Let's consider, we have this data frame "df". How to find the sum of column "aggregation" w.r.t each partition?

df = df.withColumn('sum_total', sum('aggregation').over(Window.partitionBy('partition')) display(df)

df.groupBy("partition").sum("aggregation").show()

display(df.withColumn('sum_total', sum('aggregation').over(Window.partitionBy('partition')))

Consider, we have this data frame "df" (as shown in the pic). How to replace the 1st 3 values of column 'alchohol' as " Nan "?

import pandas as pd import numpy as np df.iloc[0:3, 0] = np.nan df

import pandas as pd import numpy as np df.loc[0:3, 0] = np.nan df

import pandas as pd import numpy as np df.iloc[0:3] = np.nan df

import pandas as pd import numpy as np df.iloc[0:3] = np.nan df.show()

Which of the following options will remove duplicates from the array column "address_struct_str"? 1) from pyspark.sql.functions import array_distinct df = df.withColumn("address_dict", array_distinct("address_struct_str")) 2) from pyspark.sql.functions import udf dist_addr = udf(lambda row: list(set(row)), ArrayType(StringType())) df = df.withColumn("address_dict", dist_addr("address_struct_str"))

Both options- 1 & 2 are correct

PySpark Quiz Round

Authored by Ankita Chatterjee

Other

Professional Development

Used 1+ times

AI Actions

Add similar questions

Adjust reading levels

Convert to real-world scenario

Translate activity

More...

Content View

Student View

11 questions

Show all answers

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

Which of the following is a transformation operation in PySpark?

count()

filter()

reduce()

collect()

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

Which of the following is true for RDD?

RDD is programming paradigm

RDD in Apache Spark is an immutable collection of objects

It is a database

None of the above

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

words_list = sc.parallelize ( ["pyspark", "quiz", "questions", "at", "quiz.com"] )
filtered_words = words_list.filter(lambda x: 'quiz' in x)
matched_words= filtered_words.collect()
print(matched_words)

[ "quiz", "quiz.com" ]

[ "quiz" ]

["quiz.com" ]

Error

MULTIPLE CHOICE QUESTION

30 sec • 2 pts

Let us consider, we have a data frame "df". Then what does the expression '[.]{2,}' signify for the following transformation?
df = df.withColumn('var_addrss', sf.regexp_replace('var_addrss', '[.]{2,}', ''))

A single dot (".") followed by 2 integers

A single dot (".") followed by the integer '2'

Single dot (".") appearing twice consecutively

None of these

MULTIPLE CHOICE QUESTION

30 sec • 2 pts

Let us consider, we have a data frame "df". Then what does the expression '^[0]' signify for the following transformation?
df = df.withColumn('var_addrss', sf.regexp_replace('var_addrss', '^[0]', ''))

The value starts with 0 OR followed by a sequence of 0s

The value starts with 0 and ends with 0

The value starts with 0 and followed by a sequence of 0s

The value starts with anything other than 0

MULTIPLE SELECT QUESTION

45 sec • 1 pt

Let's assume we have the following data frame "df".
How to display the 'age' column in descending order?

display(df.orderBy(df.age.desc()))

display(df.sort(df.age.desc()))

display(df.orderBy(df.age, sort = desc()))

None of these

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

What will the data type of the columns for the following PySpark data frame "df"?
df = spark.read.format("csv").option("header", "true").option("inferSchema", "false").option("delimeter", ",").load("/mnt/temp/test.csv")

Data types of columns will be int

Data types of columns will be read as per the data types defined in the file

Data types of all columns will be string

None of the above

Access all questions and much more by creating a free account

Create resources

Host any resource

Get auto-graded reports

Continue with Google

Continue with Email

Continue with Classlink

Continue with Clever

or continue with

Microsoft

Apple

Others

Already have an account?

Similar Resources on Wayground

13 questions

disney princesses

Quiz

•

KG - Professional Dev...

10 questions

horses

Quiz

•

KG - Professional Dev...

12 questions

Pre- Training Evaluation

Quiz

•

Professional Development

10 questions

ENGLISH PRETEST

Quiz

•

10th Grade - Professi...

10 questions

Data Privacy

Quiz

•

Professional Development

10 questions

90s Nostalgia

Quiz

•

Professional Development

10 questions

Minecraft

Quiz

•

Professional Development

10 questions

Guess the blackpink song lyrics

Quiz

•

4th Grade - Professio...

Popular Resources on Wayground

10 questions

5.P.1.3 Distance/Time Graphs

Quiz

•

5th Grade

10 questions

Fire Drill

Quiz

•

2nd - 5th Grade

20 questions

Equivalent Fractions

Quiz

•

3rd Grade

15 questions

Hargrett House Quiz: Community & Service

Quiz

•

5th Grade

20 questions

Main Idea and Details

Quiz

•

5th Grade

20 questions

Context Clues

Quiz

•

6th Grade

20 questions

Inferences

Quiz

•

4th Grade

15 questions

Equivalent Fractions

Quiz

•

4th Grade

Discover more resources for Other

16 questions

Parallel, Perpendicular, and Intersecting Lines

Quiz

•

KG - Professional Dev...

35 questions

World War Two 8th G

Quiz

•

6th Grade - Professio...

7 questions

DOL REC: Solutions & Solubility Curves

Quiz

•

Professional Development

20 questions

Block Buster Movies

Quiz

•

10th Grade - Professi...

20 questions

NCAA Logo Quiz

Quiz

•

Professional Development

PySpark Quiz Round

Which of the following is a transformation operation in PySpark?

Which of the following is true for RDD?

words_list = sc.parallelize ( ["pyspark", "quiz", "questions", "at", "quiz.com"] )
filtered_words = words_list.filter(lambda x: 'quiz' in x)
matched_words= filtered_words.collect()
print(matched_words)

Let us consider, we have a data frame "df". Then what does the expression '[.]{2,}' signify for the following transformation?
df = df.withColumn('var_addrss', sf.regexp_replace('var_addrss', '[.]{2,}', ''))

Let us consider, we have a data frame "df". Then what does the expression '^[0]' signify for the following transformation?
df = df.withColumn('var_addrss', sf.regexp_replace('var_addrss', '^[0]', ''))

Let's assume we have the following data frame "df".
How to display the 'age' column in descending order?

What will the data type of the columns for the following PySpark data frame "df"?
df = spark.read.format("csv").option("header", "true").option("inferSchema", "false").option("delimeter", ",").load("/mnt/temp/test.csv")

Let's consider, we have this data frame "df".
How to find the sum of column "aggregation" w.r.t each partition?

What is the function to convert column data type from Unix Time Seconds to Date and Timestamp?

unix_timestamp()

from_unixtime()

Consider, we have this data frame "df" (as shown in the pic).
How to replace the 1st 3 values of column 'alchohol' as "Nan"?

Access all questions and much more by creating a free account

Similar Resources on Wayground

Popular Resources on Wayground

Discover more resources for Other

PySpark Quiz Round

Which of the following is a transformation operation in PySpark?

Which of the following is true for RDD?

words_list = sc.parallelize ( ["pyspark", "quiz", "questions", "at", "quiz.com"] ) filtered_words = words_list.filter(lambda x: 'quiz' in x) matched_words= filtered_words.collect() print(matched_words)

Let us consider, we have a data frame "df". Then what does the expression '[.]{2,}' signify for the following transformation?df = df.withColumn('var_addrss', sf.regexp_replace('var_addrss', '[.]{2,}', ''))

Let us consider, we have a data frame "df". Then what does the expression '^[0]*' signify for the following transformation?df = df.withColumn('var_addrss', sf.regexp_replace('var_addrss', '^[0]*', ''))

Let's assume we have the following data frame "df". How to display the 'age' column in descending order?

What will the data type of the columns for the following PySpark data frame "df"?df = spark.read.format("csv").option("header", "true").option("inferSchema", "false").option("delimeter", ",").load("/mnt/temp/test.csv")

Let's consider, we have this data frame "df".How to find the sum of column "aggregation" w.r.t each partition?

What is the function to convert column data type from Unix Time Seconds to Date and Timestamp?

unix_timestamp()

from_unixtime()

Consider, we have this data frame "df" (as shown in the pic).How to replace the 1st 3 values of column 'alchohol' as "Nan"?

Access all questions and much more by creating a free account

Similar Resources on Wayground

Popular Resources on Wayground

Discover more resources for Other

words_list = sc.parallelize ( ["pyspark", "quiz", "questions", "at", "quiz.com"] )
filtered_words = words_list.filter(lambda x: 'quiz' in x)
matched_words= filtered_words.collect()
print(matched_words)

Let us consider, we have a data frame "df". Then what does the expression '[.]{2,}' signify for the following transformation?
df = df.withColumn('var_addrss', sf.regexp_replace('var_addrss', '[.]{2,}', ''))

Let us consider, we have a data frame "df". Then what does the expression '^[0]' signify for the following transformation?
df = df.withColumn('var_addrss', sf.regexp_replace('var_addrss', '^[0]', ''))

Let's assume we have the following data frame "df".
How to display the 'age' column in descending order?

What will the data type of the columns for the following PySpark data frame "df"?
df = spark.read.format("csv").option("header", "true").option("inferSchema", "false").option("delimeter", ",").load("/mnt/temp/test.csv")

Let's consider, we have this data frame "df".
How to find the sum of column "aggregation" w.r.t each partition?

Consider, we have this data frame "df" (as shown in the pic).
How to replace the 1st 3 values of column 'alchohol' as "Nan"?