Splitting Dataset by Date in Microsoft Azure for DP-203 Exam

Split Modes for Dataset Division

Question

You have a dataset internet_sales that represents the online sale of the products.

Your manager asks you to divide this dataset by date i.e.

in two datasets one having the sales before or equal to 03/31/2021 and one after that.

Which of the following split modes will you use? (Note: the dataset is having a column named date with format mmddyyyy)

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D. E.

Correct Answer: E

Relative Expression Split option is used wherever you need to apply a condition to a number column.

This number can be a time/date field, a column representing dollar amount or age, or even a percentage.

For example: The expression \"Date" > 03/31/2010 will select all the rows with the sale after 31st march.

Option A is incorrect.

This option is used for dividing the data into 2 parts.

You can mention the percentage that how much data should be there in each split.

If you don't mention it, by default data is split 50-50.

Option B is incorrect.

There is no such splitting mode as split columns.

Option C is incorrect.

The recommender Split option is used to prepare the data for a recommender system.

Option D is incorrect.

Regular Expression Split is used to divide the dataset by testing a single column for a value.

Option E is correct.

In the given scenario, as we need to apply a condition on a number column i.e Date, the Relative Expression Split option should be used.

References:

To know more about splitting a dataset, please visit the below-given links:

The correct answer for this question is E. Relative Expression Split.

Relative Expression Split is a data transformation operation in Azure Data Factory that can be used to split a dataset based on a condition defined by an expression. In this case, the expression would be a condition that separates the sales before or equal to 03/31/2021 and the sales after that date.

To use Relative Expression Split, you need to follow the below steps:

  1. Open the Azure Data Factory portal and create a new data pipeline.
  2. Add the internet_sales dataset as a source in the pipeline.
  3. Click on the source dataset and select the Relative Expression Split transformation.
  4. In the transformation settings, specify the expression that defines the condition to split the dataset. For example, the expression could be "date <= '03/31/2021'".
  5. Specify the output datasets for the two split datasets, for example, "sales_before_03312021" and "sales_after_03312021".
  6. Save and execute the pipeline.

The Relative Expression Split mode is suitable for this scenario because it allows you to split the dataset based on a specific condition without having to split rows or columns manually. It is also more flexible and customizable than other split modes, such as Regular Expression Split or Recommender Split.