You work as a machine learning specialist for an alternative transportation ride-share company.
Your company has scooters, electric longboards, and other electric personal transportation devices in several major cities across the US.
Your machine learning team has been asked to produce a machine learning model that classifies device preference by trip duration for each of the available personal transportation devices you offer in each city.
You have created a model based on the SageMaker built-in K-Means algorithm.
You are now using hyperparameter tuning to get the best-performing model for your problem.
Which evaluation metrics and corresponding optimization direction should you choose for your automatic model tuning (a.k.a.
hyperparameter tuning)? (Select TWO)
Click on the arrows to vote for the correct answer
A. B. C. D. E.Correct Answers: C and E.
Option A is incorrect.
K-Means uses the msd (Mean Squared Distances) metric for model validation.
However, you will want to minimize this metric.
Option B is incorrect.
K-Means does not use the mse (Mean Squared Error) metric for model validation.
Option C is correct.
K-Means uses the ssd (Sum of the Squared Distances) metric for model validation, and you will want to minimize this metric.
Option D is incorrect.
K-Means does not use the f1 (weighted average of precision and recall) metric for model validation.
Option E is correct.
K-Means uses the msd (Mean Squared Distances) metric for model validation, and you will want to minimize this metric.
References:
Please see the Amazon SageMaker developer guide titled Define Metrics (https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning-define-metrics.html),
The Amazon SageMaker developer guide titled Tune a K-Means Model (https://docs.aws.amazon.com/sagemaker/latest/dg/k-means-tuning.html)
When using hyperparameter tuning, the goal is to find the optimal combination of hyperparameters that result in the best performing model for a specific task. This is typically done by evaluating the model's performance on a validation dataset using a specific evaluation metric. The evaluation metric is selected based on the problem being solved, and the direction of optimization (maximize or minimize) depends on the specific metric.
In this case, the task is to classify device preference by trip duration for each personal transportation device offered in each city. The model was created using the SageMaker built-in K-Means algorithm, and now hyperparameter tuning is being used to find the best-performing model. Therefore, the evaluation metrics and corresponding optimization directions that should be used are:
The first metric to consider is the Sum of Squared Distances (SSD) which measures the total squared distance of each data point from its cluster centroid. In this case, minimizing the SSD would be appropriate as the goal is to create well-separated clusters that accurately classify device preference by trip duration.
The second metric to consider is the Mean Square Error (MSE), which measures the average squared difference between the predicted and actual values. Since this is a regression problem, MSE would be an appropriate metric to evaluate the model's performance. In this case, minimizing the MSE would be appropriate as the goal is to minimize the error between the predicted and actual values.
Therefore, the correct answers to this question are B. mse, minimize and C. ssd, minimize.