Transformer Quiz

Transformer Quiz

Professional Development

10 Qs

quiz-placeholder

Similar activities

Computer Programming - Intro

Computer Programming - Intro

Professional Development

12 Qs

¿Cuánto sabes sobre NFTs?

¿Cuánto sabes sobre NFTs?

KG - Professional Development

10 Qs

Crypto Knowledge Quiz

Crypto Knowledge Quiz

Professional Development

10 Qs

OAuth Flisol

OAuth Flisol

Professional Development

10 Qs

Evaluación de Seguridad en Spring

Evaluación de Seguridad en Spring

Professional Development

10 Qs

Quiz sobre Fundamentos e Arquiteturas de Visão Computacional

Quiz sobre Fundamentos e Arquiteturas de Visão Computacional

Professional Development

13 Qs

Open Mic E5

Open Mic E5

Professional Development

5 Qs

EOSIO System Quiz

EOSIO System Quiz

Professional Development

8 Qs

Transformer Quiz

Transformer Quiz

Assessment

Quiz

Computers

Professional Development

Medium

Created by

Comprehensive Viva

Used 1+ times

FREE Resource

10 questions

Show all answers

1.

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

What is the primary advantage of using self-attention in Transformers?

It reduces the model size

It eliminates the need for labeled data

It allows for parallel processing of tokens

It restricts the context to local information only

2.

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

In the Transformer model, what does the “multi-head” part of multi-head attention refer to?

Multiple outputs per token

Multiple attention layers stacked together

Multiple parallel attention computations with different projections

Attention heads used only during inference

3.

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

What type of complexity does the self-attention mechanism have with respect to sequence length (n)?

O(n)

O(log n)

O(n²)

O(1)

4.

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

In a standard Transformer, how many decoder blocks are used in the original architecture for machine translation?

4

6

8

12

5.

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

Which of the following best describes how the Transformer decoder generates output during inference?

It attends to all positions in the input and output sequences

It attends to input tokens and already-generated output tokens

It uses only the encoder’s final output

It processes the entire sequence at once

6.

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

How is the final representation of each token computed in multi-head self-attention?

By summing the outputs of all attention heads

By averaging token embeddings

By concatenating attention head outputs and projecting them

By selecting the maximum attention value

7.

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

What is the key difference between encoder self-attention and encoder-decoder attention?

Encoder self-attention is masked, encoder-decoder attention is not

Encoder-decoder attention uses keys and values from the encoder, queries from the decoder

Encoder-decoder attention is only used during pre-training

Encoder-decoder attention uses keys and values from the decoder, queries from the encoder

Create a free account and access millions of resources

Create resources
Host any resource
Get auto-graded reports
or continue with
Microsoft
Apple
Others
By signing up, you agree to our Terms of Service & Privacy Policy
Already have an account?