You are building a classification model (logistic regression) which you want to optimize using the Azure ML hyperparameter tuning.
For running Hyperdrive experiments, you have the following script:
... sampling = GridParameterSampling( { '--regularization': choice(0.001, 0.01, 0.1, 1.0) } ) hyperdrive = HyperDriveConfig(estimator=hyper_estimator, hyperparameter_sampling=sampling, policy=None, [select the passing code segment here], max_total_runs=6) ... run = experiment.submit(config=hyperdrive) ...The script is still missing some configuration details necessary for Hyperdrive.
Which code segments need to be added to the script?
Click on the arrows to vote for the correct answer
A. B. C. D.Answer: B.
Option A is incorrect because the ‘r2_score' is used for regression models and it is not applicable for classification tasks; in addition, when applicable,it should be maximized in order to find the best performing run.
Option B is CORRECT because the primary metric and the method of selecting the best performing run are two parameters which are needed for the Hyperdrive to complete its task.
Option C is incorrect because the maximum number of concurrent runs is set to ‘None' as default, therefore it is not mandatory; for selecting the best run, the primary metric goal must be set.
Option D is incorrect because the best run is which has the highest value for the AUC metric.
Therefore, setting the goal parameter to ‘MINIMIZE' is incorrect.
Reference:
In the given script for running Hyperdrive experiments, some configuration details are missing which are necessary for Hyperdrive to optimize the logistic regression classification model. The missing code segments are related to the definition of the primary metric to be optimized by Hyperdrive and the maximum number of runs that can be executed concurrently.
The primary metric is the metric that Hyperdrive will optimize during the hyperparameter tuning process. It is used to evaluate the performance of different hyperparameter configurations and select the best one. In this case, we want to optimize a logistic regression model, which is a binary classification problem. Therefore, an appropriate metric to optimize would be the AUC (Area Under the Curve) of the Receiver Operating Characteristic (ROC) curve. The AUC measures the ability of the model to distinguish between the positive and negative classes, and a higher AUC means better performance.
Regarding the primary metric goal, it depends on the specific problem and the chosen metric. In this case, we want to maximize the AUC, which means that the primary metric goal should be set to PrimaryMetricGoal.MAXIMIZE.
Additionally, we need to define the maximum number of runs that can be executed concurrently. This is important because it determines the maximum number of resources that can be used during the hyperparameter tuning process. In this case, the value of max_concurrent_runs can be set to 4, which means that up to 4 runs can be executed at the same time.
Therefore, the correct code segment to add to the script is:
B. primary_metric_name=AUC
, primary_metric_goal=PrimaryMetricGoal.MAXIMIZE, max_concurrent_runs=4.
The final script with the missing code segments would look like this:
makefile... sampling = GridParameterSampling( { '--regularization': choice(0.001, 0.01, 0.1, 1.0) } ) hyperdrive = HyperDriveConfig(estimator=hyper_estimator, hyperparameter_sampling=sampling, policy=None, primary_metric_name='AUC', primary_metric_goal=PrimaryMetricGoal.MAXIMIZE, max_concurrent_runs=4, max_total_runs=6) ... run = experiment.submit(config=hyperdrive) ...