AWS Certified Machine Learning - Specialty | Best Data Preparation and Implementation for ML Inference

Best Practices for Building and Deploying a Machine Learning Model to Detect Adult Content in Images

Question

You work as a machine learning specialist for an online news organization.

Your company is implementing a crowd-sourced news feature that will allow subscribers to upload their own news stories complete with images.

You have been tasked with building and deploying a machine learning model that alerts adult content in an image uploaded by a subscriber.

Which option describes the best data preparation and implementation of your machine learning hosted inference?

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

Correct Answer: D.

Option A is incorrect.

The Semantic Segmentation SageMaker built-in algorithm is used to provide a fine-grained, pixel-level approach to developing computer vision applications.

It would function poorly as a model to detect adult content in an image.

Option B is incorrect.

The Object2Vec SageMaker built-in algorithm is used to compute the nearest neighbors of objects and visualize natural clusters of related objects in low-dimensional space.

For example, common use cases include identifying duplicate support tickets or finding the correct routing based on the similarity of text in the tickets.

It would function poorly as a model to detect adult content in an image.

Option C is incorrect.

The Image Classification SageMaker built-in algorithm is the correct algorithm for using as an inference engine to detect adult content in an uploaded image.

However, an Image Classification deployed endpoint doesn't take inference requests in the application/x-recordio content type.

Option D is correct.

The Image Classification SageMaker built-in algorithm is the correct algorithm for using as an inference engine to detect adult content in an uploaded image.

Also, an Image Classification deployed endpoint takes inference requests in the application/x-image content type.

References:

Please see the Amazon SageMaker developer guide titled Use Amazon SageMaker Built-in Algorithms (https://docs.aws.amazon.com/sagemaker/latest/dg/algos.html),

The Amazon SageMaker developer guide titled Semantic Segmentation Algorithm (https://docs.aws.amazon.com/sagemaker/latest/dg/semantic-segmentation.html),

The Amazon SageMaker developer guide titled Object2Vec Algorithm (https://docs.aws.amazon.com/sagemaker/latest/dg/object2vec.html),

The Amazon SageMaker developer guide titled Image Classification Algorithm (https://docs.aws.amazon.com/sagemaker/latest/dg/image-classification.html),

The Amazon SageMaker developer guide titled Common Data Formats for Inference (https://docs.aws.amazon.com/sagemaker/latest/dg/cdf-inference.html)

The task at hand is to build and deploy a machine learning model that detects adult content in images uploaded by subscribers. The best option for data preparation and implementation of the model hosted inference is option C, which is to use the Image Classification SageMaker built-in algorithm and code the application to query the deployed endpoint providing an image in the application/x-recordio content type.

Image classification is the task of assigning a label to an image, based on its content. In this case, the labels would be "adult content" or "not adult content". The Image Classification SageMaker built-in algorithm is suitable for this task, as it allows for training a model to classify images into different categories.

The data preparation for this model involves obtaining a labeled dataset of images, where each image is labeled as "adult content" or "not adult content". The dataset can then be used to train the model using the Image Classification SageMaker built-in algorithm.

After training the model, it can be deployed to an endpoint on AWS SageMaker, which allows for hosted inference. Hosted inference involves sending requests to the deployed endpoint with an image as input, and receiving a response that indicates whether the image contains adult content or not.

Option C specifies using the application/x-recordio content type for providing the image as input to the deployed endpoint. This content type is appropriate for binary data, such as images. The Image Classification SageMaker built-in algorithm expects input data in this format.

Options A and B are not suitable for this task because they use algorithms that are not appropriate for image classification. Option A specifies using the Semantic Segmentation SageMaker built-in algorithm, which is used for segmenting an image into different regions based on their semantic meaning, and is not suitable for image classification. Option B specifies using the Object2Vec SageMaker built-in algorithm, which is used for embedding objects into a vector space, and is also not suitable for image classification.

Option D specifies using the application/x-image content type for providing the image as input to the deployed endpoint. While this content type is appropriate for images, it is not the expected format for the Image Classification SageMaker built-in algorithm. Using the wrong content type could result in errors when querying the deployed endpoint.

In summary, the best option for data preparation and implementation of the machine learning model for detecting adult content in images uploaded by subscribers is to use the Image Classification SageMaker built-in algorithm and code the application to query the deployed endpoint providing an image in the application/x-recordio content type.