Transformer models do not have recurrent units but can still perform sequence modeling.

SummerSchool-Quiz8

Quiz
•
Computers
•
University
•
Hard
Irfan Ahmad
Used 2+ times
FREE Resource
9 questions
Show all answers
1.
MULTIPLE CHOICE QUESTION
30 sec • 1 pt
True
False
2.
MULTIPLE CHOICE QUESTION
45 sec • 1 pt
As the number of training examples goes to infinity, your model will have:
Low bias
High Bias
Same Bias
Depends on the model’s variance
3.
MULTIPLE CHOICE QUESTION
1 min • 1 pt
Compared to the encoder-decoder model which does not use an attention mechanism, we expect the attention model to have the greatest advantage when:
The input sequence length is large.
The input sequence length is small.
The vocabulary size is large.
The vocabulary size is small.
4.
MULTIPLE CHOICE QUESTION
1 min • 1 pt
You have a friend whose mood is heavily dependent on the current and past few days’ weather. You’ve collected data for the past 365 days on the weather, which you represent as a sequence as x<1>, …, x<365>. You’ve also collected data on your friend’s mood, which you represent as y<1>, …, y<365>. You’d like to build a model to map from x→y. Should you use a Unidirectional RNN or Bidirectional RNN for this problem?
Bidirectional RNN, because this allows the prediction of mood on day t to take into account more information
Bidirectional RNN, because this allows backpropagation to compute more accurate gradients
Unidirectional RNN, because the value of y<t> depends only on x<1>,…,x<t>, but not on x<t+1>,…,x<365>
Unidirectional RNN, because the value of y depends only on x<t> , and not other days
5.
MULTIPLE SELECT QUESTION
1 min • 2 pts
In beam search, if you increase the beam width, which of the following would you expect to be true?
Beam search will run more slowly
Beam search will use up more memory
Beam search will generally find better solutions
Beam search will converge after fewer steps
Beam search will run much faster as more options can be considered
6.
MULTIPLE CHOICE QUESTION
45 sec • 1 pt
How does decoder module of the transformer model avoid seeing the tokens that do not appear yet in output sequence?
Multi-head attention
Positional encoding
Self attention
Masking future positions
7.
MULTIPLE CHOICE QUESTION
45 sec • 1 pt
Which concept in transformer allows for inducing the sequence information in input tokens:
Multi-head attention
Positional encoding
Self attention
Masking future positions before the softmax step
8.
MULTIPLE SELECT QUESTION
1 min • 2 pts
Which of the following is a symptom of overfitting?
Large estimated weights
Good generalization to previously unseen data
Simple decision boundary
Complex decision boundary
9.
MULTIPLE CHOICE QUESTION
45 sec • 1 pt
Teacher forcing uses the actual output from the training dataset at time step t as input in the next time step (t+1), instead of the output generated by your model.
True
False
Similar Resources on Wayground
13 questions
Intro to IF statements

Quiz
•
KG - University
14 questions
HTML Quiz

Quiz
•
9th Grade - University
10 questions
PHP

Quiz
•
University
10 questions
0xDebug - Python

Quiz
•
University
12 questions
Operators in C

Quiz
•
University
10 questions
Expression in C Programming

Quiz
•
University
10 questions
Recap CSS

Quiz
•
University
7 questions
CSS Rules Quiz-unit 4 CodeHS

Quiz
•
7th Grade - University
Popular Resources on Wayground
25 questions
Equations of Circles

Quiz
•
10th - 11th Grade
30 questions
Week 5 Memory Builder 1 (Multiplication and Division Facts)

Quiz
•
9th Grade
33 questions
Unit 3 Summative - Summer School: Immune System

Quiz
•
10th Grade
10 questions
Writing and Identifying Ratios Practice

Quiz
•
5th - 6th Grade
36 questions
Prime and Composite Numbers

Quiz
•
5th Grade
14 questions
Exterior and Interior angles of Polygons

Quiz
•
8th Grade
37 questions
Camp Re-cap Week 1 (no regression)

Quiz
•
9th - 12th Grade
46 questions
Biology Semester 1 Review

Quiz
•
10th Grade