Data Lab PreTest

Quiz

•

Instructional Technology

•

Professional Development

•

Hard

Sinh Hoàng Văn

FREE Resource

Student preview

50 questions

Show all answers

MULTIPLE SELECT QUESTION

1 min • 1 pt

You have the following dataset and need to calculate the average salary per department, but only include departments where the average salary is greater than 4000:

df.groupBy(\Department\").avg(\"Salary\").filter(\"avg(Salary) > 4000\")"

df.createOrReplaceTempView(\employees\")\nspark.sql(\"SELECT Department, AVG(Salary) AS AvgSalary FROM employees GROUP BY Department HAVING AVG(Salary) > 4000\")"

from pyspark.sql.functions import col, avg\n\ndf.groupBy(\Department\").agg(avg(col(\"Salary\")).alias(\"AvgSalary\")).filter(col(\"AvgSalary\") > 4000)"

df.filter(\Salary > 4000\").groupBy(\"Department\").avg(\"Salary\")"

MULTIPLE SELECT QUESTION

1 min • 1 pt

You need to find all employees in departments with a budget greater than 8000. Which code snippet achieves this?

employees_df.join(departments_df, \Department\").filter(\"Budget > 8000\").select(\"Name\", \"Department\")"

employees_df.createOrReplaceTempView(\employees\")\ndepartments_df.createOrReplaceTempView(\"departments\")\nspark.sql(\"SELECT e.Name, e.Department FROM employees e JOIN departments d ON e.Department = d.Department WHERE d.Budget > 8000\")"

from pyspark.sql.functions import col\n\nemployees_df.join(departments_df, \Department\").where(col(\"Budget\") > 8000).select(\"Name\", \"Department\")"

employees_df.join(departments_df, employees_df.Department == departments_df.Department).filter(\Budget > 8000\")"

MULTIPLE SELECT QUESTION

1 min • 1 pt

From the dataset below, find the top-earning employee in each department, including their department name and salary. Use Spark SQL:

df.createOrReplaceTempView(\employees\")\nspark.sql(\"SELECT Department, Name, Salary FROM (SELECT *, ROW_NUMBER() OVER (PARTITION BY Department ORDER BY Salary DESC) AS rank FROM employees) t WHERE rank = 1\")"

df.createOrReplaceTempView(\employees\")\nspark.sql(\"SELECT Department, Name, MAX(Salary) AS Salary FROM employees GROUP BY Department, Name\")"

df.createOrReplaceTempView(\employees\")\nspark.sql(\"SELECT Department, Name, MAX(Salary) AS MaxSalary FROM employees GROUP BY Department\")"

from pyspark.sql.window import Window\nfrom pyspark.sql.functions import col, row_number\n\nwindow_spec = Window.partitionBy(\Department\").orderBy(col(\"Salary\").desc())\ndf.withColumn(\"rank\", row_number().over(window_spec)).filter(\"rank == 1\").select(\"Department\", \"Name\", \"Salary\")"

MULTIPLE SELECT QUESTION

1 min • 1 pt

Which of the following codes computes the total sales per region efficiently in Spark SQL?

df.createOrReplaceTempView(\sales\")\nspark.sql(\"SELECT Region, SUM(Sales) AS TotalSales FROM sales GROUP BY Region\")"

df.groupBy(\Region\").sum(\"Sales\")"

from pyspark.sql.functions import col, sum\n\ndf.groupBy(col(\Region\")).agg(sum(col(\"Sales\")).alias(\"TotalSales\"))"

df.filter(\Sales IS NOT NULL\").groupBy(\"Region\").sum(\"Sales\")"

MULTIPLE SELECT QUESTION

1 min • 1 pt

How can you rank products within each category by their sales in descending order?

from pyspark.sql.window import Window\nfrom pyspark.sql.functions import col, rank\n\nwindow_spec = Window.partitionBy(\Category\").orderBy(col(\"Sales\").desc())\ndf.withColumn(\"rank\", rank().over(window_spec))"

df.createOrReplaceTempView(\products\")\nspark.sql(\"SELECT Category, Product, RANK() OVER (PARTITION BY Category ORDER BY Sales DESC) AS Rank FROM products\")"

df.groupBy(\Category\", \"Product\").agg(rank().alias(\"Rank\"))"

from pyspark.sql.functions import row_number\n\nwindow_spec = Window.partitionBy(\Category\").orderBy(\"Sales DESC\")\ndf.withColumn(\"row_number\", row_number().over(window_spec))"

MULTIPLE SELECT QUESTION

1 min • 1 pt

Which function would you use to replace all NULL values in a Spark DataFrame with default values?

df.na.fill({\Column1\": 0, \"Column2\": \"N/A\"})"

df.na.replace(NULL, {\Column1\": 0, \"Column2\": \"N/A\"})"

df.fillna({\Column1\": 0, \"Column2\": \"N/A\"})"

df.replace(NULL, {\Column1\": 0, \"Column2\": \"N/A\"})"

MULTIPLE SELECT QUESTION

1 min • 1 pt

How do you efficiently join two DataFrames on multiple keys in PySpark?

df1.join(df2, (df1.Key1 == df2.Key1) & (df1.Key2 == df2.Key2))

df1.join(df2, \Key1\", \"inner\")"

df1.createOrReplaceTempView(\df1\")\ndf2.createOrReplaceTempView(\"df2\")\nspark.sql(\"SELECT * FROM df1 JOIN df2 ON df1.Key1 = df2.Key1 AND df1.Key2 = df2.Key2\")"

df1.merge(df2, \Key1\")"

Create a free account and access millions of resources

Create resources

Host any resource

Get auto-graded reports

or continue with

Microsoft

Apple

Others

By signing up, you agree to our Terms of Service & Privacy Policy

Already have an account?

Popular Resources on Wayground

20 questions

Brand Labels

Quiz

•

5th - 12th Grade

10 questions

Ice Breaker Trivia: Food from Around the World

Quiz

•

3rd - 12th Grade

25 questions

Multiplication Facts

Quiz

•

5th Grade

20 questions

ELA Advisory Review

Quiz

•

7th Grade

15 questions

Subtracting Integers

Quiz

•

7th Grade

22 questions

Adding Integers

Quiz

•

6th Grade

10 questions

Multiplication and Division Unknowns

Quiz

•

3rd Grade

10 questions

Exploring Digital Citizenship Essentials

Interactive video

•

6th - 10th Grade

Discover more resources for Instructional Technology

10 questions

How to Email your Teacher

Quiz

•

Professional Development

11 questions

NFL Football logos

Quiz

•

KG - Professional Dev...

15 questions

Trivia

Quiz

•

Professional Development

10 questions

Harmoni 1 - Unit 2 - Sınıf Eşyaları

Quiz

•

KG - Professional Dev...