You have a dataset internet_sales that represents the online sale of the products.
Your manager asks you to divide this dataset by date i.e.
in two datasets one having the sales before or equal to 03/31/2021 and one after that.
Which of the following split modes will you use? (Note: the dataset is having a column named date with format mmddyyyy)
Click on the arrows to vote for the correct answer
A. B. C. D. E.Correct Answer: E
Relative Expression Split option is used wherever you need to apply a condition to a number column.
This number can be a time/date field, a column representing dollar amount or age, or even a percentage.
For example: The expression \"Date" > 03/31/2010 will select all the rows with the sale after 31st march.
Option A is incorrect.
This option is used for dividing the data into 2 parts.
You can mention the percentage that how much data should be there in each split.
If you don't mention it, by default data is split 50-50.
Option B is incorrect.
There is no such splitting mode as split columns.
Option C is incorrect.
The recommender Split option is used to prepare the data for a recommender system.
Option D is incorrect.
Regular Expression Split is used to divide the dataset by testing a single column for a value.
Option E is correct.
In the given scenario, as we need to apply a condition on a number column i.e Date, the Relative Expression Split option should be used.
References:
To know more about splitting a dataset, please visit the below-given links:
The correct answer for this question is E. Relative Expression Split.
Relative Expression Split is a data transformation operation in Azure Data Factory that can be used to split a dataset based on a condition defined by an expression. In this case, the expression would be a condition that separates the sales before or equal to 03/31/2021 and the sales after that date.
To use Relative Expression Split, you need to follow the below steps:
The Relative Expression Split mode is suitable for this scenario because it allows you to split the dataset based on a specific condition without having to split rows or columns manually. It is also more flexible and customizable than other split modes, such as Regular Expression Split or Recommender Split.