Visualizing Training Metrics for K-Means Model | AWS ML Specialty Exam

Best Practices for Training Metric Visualization

Question

You work for an oil refinery company where you are on one of their machine learning teams.

Your team is responsible for building models that help the company decide where to place their exploratory drilling teams worldwide.

Your team lead has decided to build your model based on the K-Means built-in SageMaker algorithm.

The team lead has tasked you with providing metric visualization charts for the training runs of your team's model. How would you go about visualizing the training metrics? (Select TWO)

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D. E. F.

Answers: C, F.

Option A is incorrect.

You use the SageMaker python module called sagemaker.analytics (not pandas.analytics) from which you import TrainingJobAnalytics (not TrainingAnalytics) to gain access to the python methods that allow you to visualize your metrics in charts.

Option B is incorrect.

You use the SageMaker python module called sagemaker.analytics from which you import TrainingJobAnalytics (not TrainingAnalytics) to gain access to the python methods that allow you to visualize your metrics in charts.

Option C is correct.You use the SageMaker python module called sagemaker.analytics from which you import TrainingJobAnalytics to gain access to the python methods that allow you to visualize your metrics in charts.

Option D is incorrect.

You use the SageMaker python module called sagemaker.analytics (not pandas.analytics) from which you import TrainingJobAnalytics to gain access to the python methods that allow you to visualize your metrics in charts.

Option E is incorrect.

To set the metric name that you wish to visualize, you need to give a valid metric for the algorithm you are training.

The test:cross_entropy metric is not valid for a K-Means training run.

Option F is correct.

To set the metric name that you wish to visualize, you need to give a valid metric for the algorithm you are training.

The test:msd metric is one of the two valid for a K-Means training run.

The other valid metric for K-Means is test:ssd.

Reference:

Please see the AWS Machine Learning Blog titled Easily monitor and visualize metrics while training models on Amazon SageMaker, and the Amazon SageMaker developer guide titled Tune a K-Means model.

To visualize the training metrics for the K-Means built-in SageMaker algorithm, we can use the following two options:

Option 1: Using sagemaker.analytics and TrainingJobAnalytics

  • In your SageMaker Jupyter notebook, import the TrainingJobAnalytics module from the sagemaker.analytics package.
  • Use the TrainingJobAnalytics class to retrieve the training metrics from the training job.
  • You can then visualize the metrics using a library like Matplotlib or Seaborn.

Code example:

python
from sagemaker.analytics import TrainingJobAnalytics import matplotlib.pyplot as plt # Set the training job name and region training_job_name = '<training_job_name>' region = '<region>' # Create a TrainingJobAnalytics object for the given training job tja = TrainingJobAnalytics(training_job_name=training_job_name, region=region) # Get the training metrics data as a pandas DataFrame metrics_df = tja.dataframe() # Visualize the metrics plt.plot(metrics_df['timestamp'], metrics_df['train:accuracy'], label='train') plt.plot(metrics_df['timestamp'], metrics_df['test:accuracy'], label='test') plt.legend() plt.show()

Option 2: Using pandas.analytics and TrainingAnalytics

  • In your SageMaker Jupyter notebook, import the TrainingAnalytics module from the pandas.analytics package.
  • Use the TrainingAnalytics class to retrieve the training metrics from the training job.
  • You can then visualize the metrics using a library like Matplotlib or Seaborn.

Code example:

python
from pandas.analytics.training import TrainingAnalytics import matplotlib.pyplot as plt # Set the training job name and region training_job_name = '<training_job_name>' region = '<region>' # Create a TrainingAnalytics object for the given training job ta = TrainingAnalytics(training_job_name=training_job_name, region=region) # Get the training metrics data as a pandas DataFrame metrics_df = ta.dataframe() # Visualize the metrics plt.plot(metrics_df['timestamp'], metrics_df['train:accuracy'], label='train') plt.plot(metrics_df['timestamp'], metrics_df['test:accuracy'], label='test') plt.legend() plt.show()

Note that in both options, we are setting the metric names to 'train:accuracy' and 'test:accuracy' to visualize the accuracy of the model during training. The answer options E and F are incorrect because they provide metric names that are not commonly used in the context of K-Means training.