You work for a government census bureau in their machine learning group.

Your team is working on a model that will be used to predict population movement based on many attributes of the population and the geographic regions in which they live and move to and from.

Some of the dataset features are id, age, height, weight, family size, country of origin, etc.

You have built your model using the SageMaker built-in linear learner algorithm.

You have trained your model and deployed it using SageMaker Hosting Services.

You are now ready to send inference requests to your inference endpoint.

You have chosen to use CSV file data stored on one of your S3 buckets as your inference request data.

Since you are processing large census data files, you don't need sub-second latency. Here is an example of the CSV file data: | id | age | height (in.)| weight (lb) | family size | country of origin | ... | 6185 | 23| 75| 145| 3| USA | ... | 5437 | 54| 80| 187| 7| Canada | ... ... You know that the id attribute in your dataset is not relevant to your model's prediction results, and you didn't use it when training your model.

What is the simplest way you exclude this attribute when you send prediction requests to your inference endpoint? But you have the id attribute associated with the prediction results that your model outputs so you can easily analyze the prediction results.

Question

You work for a government census bureau in their machine learning group.

Your team is working on a model that will be used to predict population movement based on many attributes of the population and the geographic regions in which they live and move to and from.

Some of the dataset features are id, age, height, weight, family size, country of origin, etc.

You have built your model using the SageMaker built-in linear learner algorithm.

You have trained your model and deployed it using SageMaker Hosting Services.

You are now ready to send inference requests to your inference endpoint.

You have chosen to use CSV file data stored on one of your S3 buckets as your inference request data.

Since you are processing large census data files, you don't need sub-second latency. Here is an example of the CSV file data: | id | age | height (in.)| weight (lb) | family size | country of origin | ... | 6185 | 23| 75| 145| 3| USA | ... | 5437 | 54| 80| 187| 7| Canada | ... ... You know that the id attribute in your dataset is not relevant to your model's prediction results, and you didn't use it when training your model.

What is the simplest way you exclude this attribute when you send prediction requests to your inference endpoint? But you have the id attribute associated with the prediction results that your model outputs so you can easily analyze the prediction results.

Exam-Answer · Accepted Answer

Use SageMaker Batch Transform to run the predictions from your CSV file on your S3 bucket and have it exclude the id from the prediction request. Also, have Batch Transform join the id attribute to the prediction results.

Exclude Irrelevant Attribute in SageMaker Linear Learner Model Inference Requests

Question

Answers

Explanations