Which of the following statements do you agree with?

A learning algorithm's performance can be better than human-level performance but it can never be better than Bayes error.

A learning algorithm's performance can be better than human-level performance and better than Bayes error

A learning algorithm's performance can never be better than human-level performance but it can be better than Bayes error

A learning algorithm's performance can never be better than human-level performance nor better than Bayes error

You find that a team of ornithologists debating and discussing an image gets an even better 0.1% performance, so you define that as “human-level performance.” After working further on your algorithm, you end up with the following: Based on the evidence you have, which two of the following four options seem the most promising to try? (Check two options.)

Train a bigger model to try to do better on the training set

Get a bigger training set to reduce variance

You’ve now also run your model on the test set and find that it is a 7.0% error compared to a 2.1% error for the dev set. What should you do? (Choose all that apply)

Try increasing regularization to reduce overfitting to the dev set

Increase the size of the dev set

Try decreasing regularization for better generalization with the dev set

Get a bigger test set to increase its accuracy

After working on this project for a year, you finally achieve: Human-level performance, 0.10%, Training set error, 0.05%, Dev set error, 0.05%. Which of the following are true? (Check all that apply.)

This is a statistical anomaly (or must be result of statistical noise) since it should not be possible to surpass human-level performance

All or almost all of the avoidable bias has been accounted for

You are close to Bayes error and possible overfitting

With only 0.05% further progress to make, you should quickly be able to close the remaining gap to 0%

The City Council thinks that having more Cats in the city would help scare off birds. They are so happy with your work on the Bird detector that they also hire you to build a Cat detector. (Wow Cat detectors are just incredibly useful, aren’t they?) Because of years of working on Cat detectors, you have such a huge dataset of 100,000,000 cat images that training on this data takes about two weeks. Which of the statements do you agree with? (Check all that agree.)

Needing two weeks to train will limit the speed at which you can iterate

Buying faster computers could speed up your teams' iteration speed and thus your team's productivity

If 100,000,000 examples is enough to build a good enough Cat detector, you might be better off training with just 10,000,000 examples to gain a $$\approx$$ 10x improvement in how quickly you can run experiments, even if each model performs a bit worse because it's trained on less data

Having built a good Bird detector, you should be able to take the same model and hyperparameters and just apply it to the Cat dataset, so there is no need to iterate

The city asks for your help in further defining the criteria for accuracy, runtime, and memory. How would you suggest they identify the criteria?

Suggest to them that they define which criterion is most important. Then, set thresholds for the other two

Suggest that they purchase more infrastructure to ensure the model runs quickly and accurately

Suggest to them that they focus on whichever criterion is important and then eliminate the other two

Human performance for identifying birds is < 1%, training set error is 5.2% and dev set error is 7.3%. Which of the options below is the best next step?

Train a bigger network to drive down the >4.0% training error

Validate the human data set with a sample of your data to ensure the images are of sufficient quality

Try an ensemble model to reduce bias and variance

Get more data or apply regularization to reduce variance

You ask a few people to label the dataset so as to find out what is human-level performance. You find the following levels of accuracy: If your goal is to have “human-level performance” be a proxy (or estimate) for Bayes error, how would you define “human-level performance”?

0.75% (average of all four numbers above)

0.0% (because it is impossible to do better than this)

You also evaluate your model on the test set, and find the following: What does this mean? (Check the two best options.)

You should try to get a bigger dev set

You have overfit to the dev set

You have underfitted to the dev set

You should get a bigger test set

You’ve handily beaten your competitor, and your system is now deployed in Peacetopia and is protecting the citizens from birds! But over the last few months, a new species of bird has been slowly migrating into the area, so the performance of your system slowly degrades because your data is being tested on a new type of data. You have only 1,000 images of the new species of bird. The city expects a better system from you within the next 3 months. Which of these should you do first?

Use the data you have to define a new evaluation metric (using a new dev/test set) taking into account the new species, and use that to drive further progress for your team

Put the 1,000 images into the training set so as to try to do better on these birds

Add the 1,000 images into your dataset and reshuffle into a new train/dev/test split

Try data augmentation/data synthesis to get more images of the new type of bird

Structuring Machine Learning Projects

Quiz

•

Computers

•

University

•

Practice Problem

•

Hard

Trump Florence

Used 2+ times

FREE Resource

23 questions

Show all answers

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

This example is adapted from a real production application, but with details disguised to protect confidentiality.
You are a famous researcher in the City of Peacetopia. The people of Peacetopia have a common characteristic: they are afraid of birds. To save them, you have to build an algorithm that will detect any bird flying over Peacetopia and alert the population.
The City Council gives you a dataset of 10,000,000 images of the sky above Peacetopia, taken from the city’s security cameras. They are labeled:
y = 0: There is no bird on the image
y = 1: There is a bird on the image
Your goal is to build an algorithm able to classify new images taken by security cameras from Peacetopia.
There are a lot of decisions to make:
What is the evaluation metric?
How do you structure your data into train/dev/test sets?
Metric of success
The City Council tells you the following that they want an algorithm that
Has high accuracy.
Runs quickly and takes only a short time to classify a new image.
Can fit in a small amount of memory, so that it can run in a small processor that the city will attach to many different security cameras.
You meet with them and ask for just one evaluation metric. True/False?

True

False

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

The city revises its criteria to:
"We need an algorithm that can let us know a bird is flying over Peacetopia as accurately as possible."
"We want the trained model to take no more than 10 sec to classify a new image.”
“We want the model to fit in 10MB of memory.”
Given models with different accuracies, runtimes, and memory sizes, how would you choose one?

Create one metric by combining the three metrics and choose the best performing model

Find the subset of models that meet the runtime and memory criteria. Then, choose the highest accuracy

Take the model with the smallest runtime because that will provide the most overhead to increase accuracy

Accuracy is an optimizing metric, therefore the most accurate model is the best choice

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

Based on the city’s requests, which of the following would you say is true?

Accuracy is an optimizing metric; running time and memory size are satisfying metrics

Accuracy, running time and memory size are all optimizing metrics because you want to do well on all three

Accuracy, running time and memory size are all satisfying metrics because you have to do sufficiently well on all three for your system to be acceptable

Accuracy is a satisfying metric; running time and memory size are an optimizing metric

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

Structuring your data
Before implementing your algorithm, you need to split your data into train/dev/test sets. Which of these do you think is the best choice?

Train: 3,333,334

Dev: 3,333,334

Test 3,333,334

Train: 6,000,000

Dev: 1,000,000

Test: 3,000,000

Train: 6,000,000

Dev: 3,000,000

Test: 1,000,000

Train: 9,500,000

Dev: 250,000

Test: 250,000

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

After setting up your train/dev/test sets, the City Council comes across another 1,000,000 images, called the “citizens’ data”. Apparently the citizens of Peacetopia are so scared of birds that they volunteered to take pictures of the sky and label them, thus contributing these additional 1,000,000 images. These images are different from the distribution of images the City Council had originally given you, but you think it could help your algorithm.
Notice that adding this additional data to the training set will make the distribution of the training set different from the distributions of the dev and test sets.
Is the following statement true or false?
"You should not add the citizens' data to the training set, because if the training distribution is different from the dev and test sets, then this will not allow the model to perform well on the test set."

True

False

Answer explanation

Sometimes we'll need to train the model on the data that is available, and its distribution may not be the same as the data that will occur in production. Also, adding training data that differs from the dev set may still help the model improve performance on the dev set. What matters is that the dev and test set have the same distribution.

MULTIPLE SELECT QUESTION

45 sec • 1 pt

One member of the City Council knows a little about machine learning, and thinks you should add the 1,000,000 citizens’ data images to the test set. You object because:

This would cause the dev and test set distributions to become different. This is a bad idea because you're not aiming where you want to hit

The test set no longer reflects the distribution of data (security cameras) you most care about

The 1,000,000 citizens' data images do not have a consistent x-->y mapping as the rest of the data

A bigger test set will slow down the speed of iterating because of the computational expense of evaluating models on the test set

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

If your goal is to have “human-level performance” be a proxy (or estimate) for Bayes error, how would you define “human-level performance”?

The performance of the head of the City Council

The best performance of a specialist (ornithologist) or possibly a group of specialists

The performance of the average citizen of Peacetopia

The performance of their volunteer amateur ornithologists

Access all questions and much more by creating a free account

Create resources

Host any resource

Get auto-graded reports

Continue with Google

Continue with Email

Continue with Classlink

Continue with Clever

or continue with

Microsoft

Apple

Others

Already have an account?

Similar Resources on Wayground

20 questions

Software Engineering Quiz 1

Quiz

•

University

20 questions

Round 1 ( RHA Learner Contest )

Quiz

•

University

20 questions

Network topologies

Quiz

•

8th Grade - University

18 questions

CIS2303 Week 4_5 Ch_3

Quiz

•

University

20 questions

Application Software - Revision Quiz

Quiz

•

University

20 questions

WML & WAP

Quiz

•

University

20 questions

Basic of Computer and Network

Quiz

•

8th Grade - University

20 questions

SPCC ISE 2 Quiz TE B

Quiz

•

University

Popular Resources on Wayground

15 questions

Fractions on a Number Line

Quiz

•

3rd Grade

20 questions

Equivalent Fractions

Quiz

•

3rd Grade

25 questions

Multiplication Facts

Quiz

•

5th Grade

$fractions$

22 questions

fractions

Quiz

•

3rd Grade

20 questions

Main Idea and Details

Quiz

•

5th Grade

20 questions

Context Clues

Quiz

•

6th Grade

15 questions

Equivalent Fractions

Quiz

•

4th Grade

20 questions

Figurative Language Review

Quiz

•

6th Grade

Discover more resources for Computers

30 questions

Quiz 1 Review

Quiz

•

University

Structuring Machine Learning Projects

23 questions

Based on the city’s requests, which of the following would you say is true?

Structuring your dataBefore implementing your algorithm, you need to split your data into train/dev/test sets. Which of these do you think is the best choice?

One member of the City Council knows a little about machine learning, and thinks you should add the 1,000,000 citizens’ data images to the test set. You object because:

If your goal is to have “human-level performance” be a proxy (or estimate) for Bayes error, how would you define “human-level performance”?

Which of the following statements do you agree with?

You’ve now also run your model on the test set and find that it is a 7.0% error compared to a 2.1% error for the dev set. What should you do? (Choose all that apply)

Access all questions and much more by creating a free account

Similar Resources on Wayground

Popular Resources on Wayground

Discover more resources for Computers

Structuring your data
Before implementing your algorithm, you need to split your data into train/dev/test sets. Which of these do you think is the best choice?