Question 109 of 140 from exam DP-100: Designing and Implementing a Data Science Solution on Azure

Question 109 of 140 from exam DP-100: Designing and Implementing a Data Science Solution on Azure

Question

You need to build an ML pipeline which takes an input dataset and trains two regression models so that you can compare their performance and, on the result of the comparison, you can decide which one to use for real time predictions.

You know that the Split data module of the MD Designer is a great means of distributing data between the training subprocesses.

The Split data module has 2 x 2 out data flows like this:

&
& spit bata

Split the data into training set (0.7) and testing
Q ©
* 2
1
am - 3 4

How do you need to connect the outputs of the Split data module?

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

Answer: B.

Option A is incorrect because the Boosted Decision Tree and the Decision forest modules are the two algorithms to be used.

They are “inputs” to the Train model modules; they themselves don't have input connectors.

Option B is CORRECT because Split data separates its input data into two distinct sets, which are typically used for training and testing (scoring)

In this case, 70% of the rows go to the training set and the rest will be used for testing.

The training data should go into the two Train model modules, while the rest of the data will be used by the Score model modules.

Option C is incorrect because 70% of the data rows go to the training set (1, 2), which forms the inputs of the Train model modules rather than the Scoring.

Option D is incorrect because the Evaluate model takes its input from the Score model, i.e.

connecting 3 and 4 will not give the expected result.

Reference: