Azure DP-100: Building an ML Designer Pipeline for Data Science Solution on Azure

Designing an ML Designer Pipeline for Data Science Solution on Azure

Question

Your task is to ingest data from a CSV file, to train an ML model, using a regression algorithm and evaluate the model's performance.

In order to do that, you need to build an ML Designer pipeline.

Which of the following modules should you drag onto the canvas, in what order? Load data Import data Evaluate model Score model Train model Split data.

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

Answer: B.

Option A is incorrect because there is no module named Load Data on the ML designer palette; scoring must be executed before evaluation of results.

Option B is CORRECT because this is the right order of steps you should follow.

Option C is incorrect because splitting data needs to be done before training; scoring and evaluating can be executed on the results of the training step.

Option D is incorrect because there is no module named Load Data on the ML designer palette.

You can use Import data.

Reference:

The correct answer for the pipeline to ingest data from a CSV file, train an ML model using a regression algorithm, and evaluate the model's performance is option D - Load data -> Split data -> Train model -> Score model -> Evaluate model.

Here is a detailed explanation of each module and its order:

  1. Load data: The first step is to load the CSV file data into the pipeline using the Load Data module. This module helps to read the data from a CSV file and create a data set.

  2. Split data: The next step is to split the dataset into two parts, a training set and a testing set. The Split Data module helps to divide the data into a ratio of training and testing data. Typically, we use 70% for training and 30% for testing.

  3. Train model: After splitting the data, the training set is used to train the ML model using the Train Model module. This module trains the regression model on the training data.

  4. Score model: Once the model is trained, it is important to check its performance using the Score Model module. This module helps to apply the trained model on the testing data and generate predictions for it.

  5. Evaluate model: Finally, the Evaluate Model module is used to compare the predicted results with the actual results and calculate various performance metrics like mean absolute error (MAE), mean squared error (MSE), and root mean squared error (RMSE). This helps to determine how well the model is performing and if there is any scope for improvement.

Therefore, the correct order of the modules is Load data -> Split data -> Train model -> Score model -> Evaluate model. Option D is the correct answer.