You are ingesting data from multiple sources (CSV, JSON, and Parquet). Which of the following statements are correct?

EDA Quiz 1

Quiz
•
Computers
•
Professional Development
•
Medium
Vijay Agrawal
Used 1+ times
FREE Resource
27 questions
Show all answers
1.
MULTIPLE SELECT QUESTION
30 sec • 1 pt
CSV files cannot handle hierarchical data structures.
JSON files are human-readable and can handle nested objects.
Parquet files store data in a columnar format and allow efficient compression.
CSV files always load faster than Parquet files.
2.
MULTIPLE SELECT QUESTION
30 sec • 1 pt
You have a 50 GB CSV file you need to ingest and analyze. Which approach(es) could be most practical?
Use Python's built-in open() and read line by line in a loop.
Use pandas.read_csv() without specifying chunksize.
Use chunking in pandas (chunksize parameter) to process data in smaller batches.
Convert the CSV into a more compressed format like Parquet and use a distributed environment (e.g., PySpark).
3.
MULTIPLE SELECT QUESTION
30 sec • 1 pt
Which of the following summary statistics are typically useful during EDA?
Mean, median, mode
Standard deviation, variance
Range, interquartile range
Confusion matrix
4.
MULTIPLE SELECT QUESTION
30 sec • 1 pt
When examining a dataset's distribution, which are signs that the data might be right-skewed?
The mean is greater than the median.
The mean is less than the median.
A histogram shows a longer tail to the right.
The mode is greater than the median.
5.
MULTIPLE SELECT QUESTION
30 sec • 1 pt
You have a dataset of housing prices. You notice that some houses are extremely expensive compared to the rest. Which methods can help you identify outliers effectively?
Box plot to detect points beyond 1.5 IQR from the quartiles.
Z-scores to find values far from the mean.
Dropping all data above the median price.
Calculating the difference between max and min values.
6.
MULTIPLE CHOICE QUESTION
30 sec • 1 pt
What does a high variance in a dataset indicate?
The data points are spread out from the mean
The data points are closely clustered around the mean
The dataset has a low level of variability
The dataset has many missing values
7.
MULTIPLE SELECT QUESTION
30 sec • 1 pt
Which transformations are commonly used to stabilize variance or reduce skew in data?
Box-Cox transformation
Log transformation
Min-Max scaling
One-hot encoding
Create a free account and access millions of resources
Similar Resources on Quizizz
22 questions
ICTPRG431Session 2 Introduction Database Management Systems Quiz

Quiz
•
Professional Development
30 questions
Excel Certification Practice test

Quiz
•
Professional Development
22 questions
CISSP Asset Quiz

Quiz
•
Professional Development
25 questions
[DP-900] Módulo 04 - Análise de dados

Quiz
•
Professional Development
30 questions
Module-3 [Data Processing, Data Wrangling, Data Visualization]

Quiz
•
Professional Development
25 questions
Myob Accounting/Kompt. Akt.

Quiz
•
Professional Development
23 questions
SQL

Quiz
•
12th Grade - Professi...
31 questions
Quiz 7.1

Quiz
•
Professional Development
Popular Resources on Quizizz
15 questions
Character Analysis

Quiz
•
4th Grade
17 questions
Chapter 12 - Doing the Right Thing

Quiz
•
9th - 12th Grade
10 questions
American Flag

Quiz
•
1st - 2nd Grade
20 questions
Reading Comprehension

Quiz
•
5th Grade
30 questions
Linear Inequalities

Quiz
•
9th - 12th Grade
20 questions
Types of Credit

Quiz
•
9th - 12th Grade
18 questions
Full S.T.E.A.M. Ahead Summer Academy Pre-Test 24-25

Quiz
•
5th Grade
14 questions
Misplaced and Dangling Modifiers

Quiz
•
6th - 8th Grade