Data Science / ML #1

Data Science / ML #1

1st - 3rd Grade

6 Qs

quiz-placeholder

Similar activities

Machine Learning

Machine Learning

1st - 3rd Grade

10 Qs

DATA HANDLING

DATA HANDLING

KG - University

10 Qs

Innovative Tools for Education

Innovative Tools for Education

2nd Grade

10 Qs

The Scientific Method -Vocabulary

The Scientific Method -Vocabulary

3rd - 5th Grade

9 Qs

Graphing Vocab 3rd Grade

Graphing Vocab 3rd Grade

3rd - 5th Grade

11 Qs

summerschool_q_d2

summerschool_q_d2

1st Grade

10 Qs

Introduction to Data Science

Introduction to Data Science

1st Grade - University

10 Qs

Trial

Trial

1st - 3rd Grade

10 Qs

Data Science / ML #1

Data Science / ML #1

Assessment

Quiz

Science, Mathematics

1st - 3rd Grade

Medium

Created by

Julien Parenti

Used 3+ times

FREE Resource

6 questions

Show all answers

1.

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

In which of the following situations is overfitting happenning ?

The model can't capture the underlying trend of the data because it can fit the data well enough

The model takes an extremely time to converge

The model captures the noise of the data because it fits the data too well

The model's performance on the training and validation sets is very low

2.

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

Balancing the size of a dataset and the number of features you use to train a model is always a problem you need to consider.

Which of the following statements correctly summarizes your thought about the relationship between features and dataset sizes ?

When training a model, as you add more features to the dataset, you often need to increase the dataset's size to ensure the model learns reliably.

When training a model, adding more features to the dataset increases the amount of informations you can extract from the data. This allows you to use smaller datasets and still extracts good performance from the data.

When training a learning algorithm, as you decrease the number of features in your dataset, you need to increase the number of training samples to make up the difference.

When training a learning algorithm, the number of features in your dataset is entirely independant of the number of training samples.

3.

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

Let's assume you are working with a severely imbalanced dataset. You want to split the data into 2 categories using a classification learning algorithm.

Which of the following metrics should you avoid using when analyzing the algorithm's performance ?

Recall

Precision

F1-score

Accuracy

4.

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

When training a machine learning model, we need to compute how different our predictions are from the expected results. 2 popular ways to compute this difference are the Root Mean Squared Error (RMSE) and the Mean Absolute Error (MAE). These two metrics have different properties that will shine depending on the problem you want to solve.

Which of the following is a correct statement about these metrics ?

RMSE penalizes larger differences between the predictions and the expected results.

RMSE is significantly faster to compute than MAE.

From both metrics, RMSE is the only one indifferent to the direction of the error.

From both metrics, MAE is the only one indifferent to the direction of the error.

5.

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

You want to use a Decision Tree on your dataset. Unfortunately, the results are biased because the data is imbalanced. You still think the DT is the right approach, so you decide to find a solution for this problem. Which of the following is a good strategy to reduce bias from the results ?

Fit the decision tree then scale the results appropriately using the initial dataset as guidance.

Balance the dataset and only then fit the decision tree.

Fit the decision tree to each class separately, then combine the results proportionnaly to the initial dataset.

Decision trees will never work with imbalance datasets.

6.

MULTIPLE SELECT QUESTION

45 sec • 1 pt

A fundamental topic in ML is bias, variance and their relationship with learning algorithms. High variance models typically pay a lot of attention to the training data and don't generalize well to unseen data. Low variance models focus on patterns that will later generalize well to unseen data.

Which of the following algorithms can be considered high variance models ?

Support Vector Machine (SVM)

Decision tree

Logistic regression

Random Forest

Discover more resources for Science