You are using Azure's Auto ML functionality to train models on your dataset containing around 15 000 observations.
In order to validate the models, Auto ML needs a dataset to compare the results of the predictions with.
You decide to use 20% of your input data to validate the results.
You have the following configuration script which needs to be completed:
# configure Auto ML my_data = Dataset.Tabular.from_delimited_files(data) automl_config = AutoMLConfig(compute_target = aml_remote_compute, task = 'classification', primary_metric = 'AUC_weighted', training_data = my_data, <insert code here,> label_column_name = 'Class' )Which of the following options can be used to achieve this goal?
Click on the arrows to vote for the correct answer
A. B. C. D.Answer: D.
Option A is incorrect because In the case no validation data is provided explicitly, auto ML applies default methods for validation, depending on the number of rows (observations) in the input dataset.
If the dataset contains less than 20 000 rows, thecross-validation method is selected and used automatically, i.e.
partitions of the original training data are used for cross-checking the performance of the runs.
By default, it takes the 10% of the original data to use for validation.
Since you want 20%, leaving the code as it is not the right option for you.
Option B is incorrect because while setting the validation_data would be a valid option to define a second set to be used for validation, in this case your code should contain a statement for splitting the original data into training and validation sets.
Since there is no such statement present, the option is incorrect.
Option C is incorrect because in order to define the validation dataset, you can either define the training/validation split manually (by explicitly setting validation_data) or by giving only one dataset (training_data) and specifying the validation_size.
These two ways cannot be mixed.
Option D is CORRECT because you provided only one dataset for your experiments, which is training data.
By setting the validation_size parameter to 0.2, you instruct Auto ML to keep 20% of the dataset for validation purposes.
Reference:
The correct option to achieve the goal of using 20% of the input data to validate the results is option C: validation_data = validation_data, validation_size = 0.2
.
The AutoMLConfig
class in Azure Machine Learning is used to define the configuration for the automated machine learning process. In this case, the code is defining the configuration for training a classification model using Azure's AutoML functionality. The configuration requires specifying various parameters such as the compute target, task, primary metric, training data, and label column name.
To specify the validation data that AutoML will use to compare the results of the predictions, the validation_data
and validation_size
parameters can be used. The validation_data
parameter takes in the validation dataset as input while the validation_size
parameter specifies the proportion of the training dataset that should be used for validation.
Option A is incorrect because the validation data and size parameters are not included in the configuration. Option B is also incorrect because it only specifies the validation_data
parameter without including the validation_size
parameter to define the proportion of the training data to be used for validation. Option D is also incorrect because it only includes the validation_size
parameter without specifying the validation data.
Therefore, the correct option is C, which includes both the validation_data
and validation_size
parameters and defines the proportion of the training data to be used for validation.