You work as a machine learning specialist for a book publishing firm.
Your firm is releasing a new publication and would like to use a machine learning model to structure a marketing campaign for the new publication to decide whether to market to each of their registered customers or not.
You and your machine learning team have developed a model using the XGBoost SageMaker built-in algorithm.
You are now at the hyperparameter optimization stage, where you are trying to find the best version of your model by running several training jobs on your data using your XGBoost algorithm.
How do you configure your hyperparameter tuning jobs to get a recommendation for the best values for your hyperparameters?
Click on the arrows to vote for the correct answer
A. B. C. D.Correct Answer: B.
Option A is incorrect.
You do not want to restrict your hyperparameter tuning job by setting any of your tunable hyperparameters to specific values.
Also, you will want to maximize the auc evaluation metric, not minimize it.
Option B is correct.
Setting the values of your tunable hyperparameters to ranges of values allows your hyperparameter tuning jobs to use either bayesian or random search to find the best combination of values.
Also, maximizing the auc optimization metric is a proven approach to reaching the optimal set of tunable hyperparameters.
Option C is incorrect.
Setting the values of your tunable hyperparameters to ranges of values allows your hyperparameter tuning jobs to use either bayesian or random search to find the best combination of values.
However, choosing to minimize ndcg as your optimization metric will result in a suboptimal result.
The ndcg metric is supposed to be maximized, not minimized.
Option D is incorrect.
Setting the values of your tunable hyperparameters to ranges of values allows your hyperparameter tuning jobs to use either bayesian or random search to find the best combination of values.
However, running only one hyperparameter tuning job will not give you the optimal result.
You need to run several training jobs to get to the best set of tunable hyperparameters in a reasonable amount of time.
References:
Please see the AWS SageMaker developer guide titled Perform Automatic Model Tuning (https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning.html),
The AWS SageMaker developer guide titled Tune an XGBoost Model (https://docs.aws.amazon.com/sagemaker/latest/dg/xgboost-tuning.html),
The AWS SageMaker developer guide titled How Hyperparameter Tuning Works (https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning-how-it-works.html)
To configure hyperparameter tuning jobs for an XGBoost model using SageMaker, we need to specify the hyperparameters to tune and the ranges of values for each hyperparameter. We also need to choose an optimization metric that we want to maximize or minimize while training the model.
In this case, the marketing campaign model will be trained using the XGBoost algorithm in SageMaker, and the goal is to find the best values for the hyperparameters that maximize the performance of the model.
Option A suggests setting specific values for some hyperparameters and a range of values for the max_depth hyperparameter. Additionally, it suggests choosing to minimize the area under the curve (auc) as the optimization metric. This option does not allow for a range of values to be explored for each hyperparameter and may not provide enough hyperparameter variation to find the best model.
Option B suggests setting ranges of values for all hyperparameters and choosing to maximize the area under the curve (auc) as the optimization metric. This is a better option than option A because it allows for exploration of all hyperparameters and the optimization metric chosen is a commonly used metric in classification problems.
Option C suggests setting ranges of values for all hyperparameters and choosing to minimize the normalized discounted cumulative gain (ndcg) as the optimization metric. NDCG is a commonly used metric in recommendation problems where the order of the recommended items is important. In this case, we are dealing with a classification problem, and the use of NDCG as an optimization metric is not appropriate.
Option D suggests setting ranges of values for all hyperparameters and launching only one training job. This is not a good option because it does not explore the range of hyperparameters, and there is no way to ensure that the model has been optimized.
Therefore, option B is the best option for configuring hyperparameter tuning jobs for this marketing campaign model using the XGBoost algorithm in SageMaker.