You work as a machine learning specialist for a financial services firm that specializes in risk analysis for other financial services firms.
Your machine learning team has been tasked with building a model that categorizes a firm's foreign exchange risk for each of their portfolios.
You have begun building your model using SageMaker Studio, and you are at the point in your data exploration where you need to know the importance of each of the features in your training dataset.
Which option gives you the most efficient view of this feature comparison?
Click on the arrows to vote for the correct answer
A. B. C. D.Correct Answer: D.
Option A is incorrect.
The SageMaker Data Wrangler target leakage visualization shows when there is data in a machine learning training dataset that is strongly correlated with the target label.
This visualization will not give you the importance score of each feature.
Option B is incorrect.
The SageMaker Clarify bias visualization helps you identify bias during data preparation.
This visualization will not give you the importance score of each feature.
Option C is incorrect.
The SageMaker Data Wrangler bias visualization helps you uncover potential biases in your data.
This visualization will not give you the importance score of each feature.
Option D is correct.
The SageMaker Data Wrangler target leakage visualization helps you evaluate your data by producing importance scores for each feature in your dataset.
References:
Please see the Amazon SageMaker developer guide titled Analyze and Visualize (https://docs.aws.amazon.com/sagemaker/latest/dg/data-wrangler-analyses.html),
The Amazon SageMaker developer guide titled Generate Reports for Bias in Pretraining Data in SageMaker Studio (https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-data-bias-reports-ui.html)
To efficiently view the importance of each feature in a training dataset, we need to use a feature importance method that ranks the relative importance of each feature in the dataset. The most efficient way to compare the features is by using SageMaker Clarify bias visualization.
Option A: The SageMaker Data Wrangler target leakage visualization is used to identify any data leakage in the target variable during the data preprocessing stage. It does not provide any information on the importance score of each feature in the dataset.
Option B: SageMaker Clarify provides bias detection and explainability capabilities. It offers feature importance metrics to explain which features are driving the model's output, such as Shapley Values or permutation importance, and displays them in a table for easy comparison. This option is the most efficient way to compare the features and their relative importance.
Option C: The SageMaker Data Wrangler bias visualization is used to detect bias in the dataset. It does not provide any information on the importance score of each feature in the dataset.
Option D: The SageMaker Data Wrangler quick model visualization provides an overview of the model's performance metrics such as accuracy, F1 score, and AUC. It does not provide any information on the importance score of each feature in the dataset.
Therefore, option B, using SageMaker Clarify bias visualization to show the importance score of each feature in a table, is the most efficient way to view the importance of each feature in a training dataset.