Henry is a Data Engineer of Whizlabs Inc working on Databricks Spark streaming.He's using PySpark for the development of dataframes.He needs to perform the data aggregations & count of distinct data frame operations in the dataframe. Which of the following is the correct code snippet in this scenario?

countDistinctDF = nonNullD.countDistinctDF = nonNullD.countDistinctDF = nonNullD.countDistinctDF = nonNullD.

Performing Data Aggregations and Count of Distinct Data Frame Operations with PySpark in Databricks | Whizlabs Inc

Data Aggregations and Count of Distinct Data Frame Operations in PySpark | Whizlabs Inc

Prev Question Next Question

Question

Henry is a Data Engineer of Whizlabs Inc working on Databricks Spark streaming.

He's using PySpark for the development of dataframes.

He needs to perform the data aggregations & count of distinct data frame operations in the dataframe.

Which of the following is the correct code snippet in this scenario?

Answers

A. countDistinctDF = nonNullD.

B. select(“emp_id”, “emp_name”) .groupBy(“emp_id).agg(countDistinct(“emp_name”).alias(“distinct_emp_name”) display(countDistinctDF)

A. countDistinctDF = nonNullD.

D. select(“emp_id”, “emp_name”) .groupBy(“emp_id).aggregate(countDistinct(“emp_name”).alias(“distinct_emp_name”) display(countDistinctDF)

A. countDistinctDF = nonNullD.

F. select(“emp_id”, “emp_name”).agg(countDistinct(“emp_name”).alias(“distinct_emp_name”). .groupBy(“emp_id) display(countDistinct)

A. countDistinctDF = nonNullD.

H. select(“emp_id”, “emp_name”) .groupBy(“emp_id).aggregate().(countDistinct(“emp_name”).alias(“distinct_emp_name”) display(countDistinctDF)

Show Answer

Explanations

Click on the arrows to vote for the correct answer

A. B. A. D. A. F. A. H.

Correct Answer: A.

The correct code snippet in this scenario is:

B. select(“emp_id”, “emp_name”).groupBy(“emp_id”).agg(countDistinct(“emp_name”).alias(“distinct_emp_name”)).display()

Explanation:

The code snippet performs the data aggregations and count of distinct dataframe operations in the dataframe. Let's break down the code snippet step-by-step:

Step 1: select(“emp_id”, “emp_name”) This step selects the two columns “emp_id” and “emp_name” from the dataframe.

Step 2: groupBy(“emp_id”) This step groups the data by the “emp_id” column.

Step 3: agg(countDistinct(“emp_name”).alias(“distinct_emp_name”)) This step performs the aggregation operation countDistinct on the “emp_name” column and aliases the result as “distinct_emp_name”.

Step 4: display() This step displays the result of the aggregation operation on the console.

Therefore, the correct code snippet is:

select(“emp_id”, “emp_name”).groupBy(“emp_id”).agg(countDistinct(“emp_name”).alias(“distinct_emp_name”)).display()

Prev Question Next Question