You work for a government census bureau in their machine learning group.
Your team is working on a model that will be used to predict population movement based on many attributes of the population and the geographic regions in which they live and move to and from.
Some of the dataset features are id, age, height, weight, family size, country of origin, etc.
You have built your model using the SageMaker built-in linear learner algorithm.
You have trained your model and deployed it using SageMaker Hosting Services.
You are now ready to send inference requests to your inference endpoint.
You have chosen to use CSV file data stored on one of your S3 buckets as your inference request data.
Since you are processing large census data files, you don't need sub-second latency. Here is an example of the CSV file data: | id | age | height (in.)| weight (lb) | family size | country of origin | ... | 6185 | 23| 75| 145| 3| USA | ... | 5437 | 54| 80| 187| 7| Canada | ... ... You know that the id attribute in your dataset is not relevant to your model's prediction results, and you didn't use it when training your model.
What is the simplest way you exclude this attribute when you send prediction requests to your inference endpoint? But you have the id attribute associated with the prediction results that your model outputs so you can easily analyze the prediction results.
Click on the arrows to vote for the correct answer
A. B. C. D.Answer: A.
Option A is correct.
The simplest way to first exclude the id attribute from the inference prediction requests and then join the id attribute to the prediction results is to use Amazon SageMaker Batch Transform.
Option B is incorrect.
While you could use Kinesis DataAnalytics to exclude the id attribute from your prediction request and then join the attribute with the prediction results, this would not be as simple a solution as just using Batch Transform pre and post-processing.
Option C is incorrect.
While you could use Kinesis Data Analytics to exclude the id attribute from your prediction requests and then use Kinesis Data Streams to join the id attribute to the prediction results, possibly using a lambda function you would have to write, this approach would not be as simple as just using Batch Transform pre and post-processing.
Option D is incorrect.
You could use Kinesis Data Firehose Data Transformation to exclude your id attribute from your prediction requests and then use Kinesis Data Streams to join the id attribute to the prediction results, possibly using a lambda function you would have to write, this approach would not be as simple as just using Batch Transform pre and post-processing.
Reference:
Please see the AWS announcement titled SageMaker Batch Transform now enables associating prediction results with input attributes, the Amazon SageMaker developer guide titled Associate Prediction Results with Input Records, the Amazon SageMaker developer guide titled Deploy a Model on Amazon SageMaker Hosting Services, the AWS Lambda developer guide titled Using AWS Lambda with Amazon Kinesis, and the Amazon Kinesis Data Firehose developer guide titled Amazon Kinesis Data Firehose Data Transformation.
The simplest way to exclude the "id" attribute when sending prediction requests to an Amazon SageMaker inference endpoint is to use SageMaker Batch Transform. Batch Transform is a high-performance, scalable service that enables you to process large datasets stored in Amazon S3 and generate inferences without requiring real-time or sub-second latency.
To use SageMaker Batch Transform to exclude the "id" attribute from the prediction requests, you will need to do the following:
Create a trained model in SageMaker using the built-in linear learner algorithm and deploy it using SageMaker Hosting Services.
Upload your input CSV file to an S3 bucket.
Create a Transform Job using SageMaker Batch Transform, specifying the input data source (the S3 bucket containing the CSV file), the output data location (another S3 bucket), and the location of the trained model.
In the Transform Job, specify the content type of the input data, which in this case is CSV, and provide the schema that defines the structure of the input data.
Use the "RecordWrapper" feature in the schema to exclude the "id" attribute from the input data.
After the Transform Job is complete, you can download the output CSV file from the S3 bucket containing the prediction results, which will include the excluded "id" attribute associated with the prediction results.
Therefore, the correct answer is A. Use SageMaker Batch Transform to run the predictions from your CSV file on your S3 bucket and have it exclude the id from the prediction request. Also, have Batch Transform join the id attribute to the prediction results.