You work as a machine learning specialist for a start-up software company that builds a mobile app that subscribers can use to identify various types of birds from pictures they take with their phone camera.
You have a large set of unlabeled images of birds that you want to use as your training data for your image recognition application.
Which option is the most efficient approach to creating a labeling job to build the training dataset for your mobile app?
Click on the arrows to vote for the correct answer
A. B. C. D.Correct Answer: C.
Option A is incorrect.
Using the k-means algorithm to label your unlabeled images would be far less efficient than using SageMaker Ground Truth.
Also, the SageMaker Semantic Segmentation algorithm is not an efficient algorithm to use to auto-annotate your images.
Option B is incorrect.
Using the Data Wrangler to label your unlabeled images would be far less efficient than using SageMaker Ground Truth.
Option C is correct.
SageMaker Ground Truth is the preferred method of labeling unlabeled image data.
Also, using lambda functions in your labeling job allows you to automate the annotation consolidation and pre-labeling tasks.
Finally, the SageMaker Image Classification built-in algorithm is the best choice for the auto-annotation task.
Option D is incorrect.
Glue ETL jobs cannot perform your annotation consolidation and pre-labeling tasks as efficiently as using lambda functions for these tasks in your labeling job.
References:
Please see the AWS SageMaker developer guide titled Image Classification Algorithm (https://docs.aws.amazon.com/sagemaker/latest/dg/image-classification.html),
The AWS Examples GitHub repository titled From Unlabeled Data to a Deployed Machine Learning Model: A SageMaker Ground Truth Demonstration for Image Classification (https://github.com/aws/amazon-sagemaker-examples/blob/master/ground_truth_labeling_jobs/from_unlabeled_data_to_deployed_machine_learning_model_ground_truth_demo_image_classification/from_unlabeled_data_to_deployed_machine_learning_model_ground_truth_demo_image_classification.ipynb),
The AWS SageMaker developer guide titled Prepare ML Data with Amazon SageMaker Data Wrangler (https://docs.aws.amazon.com/sagemaker/latest/dg/data-wrangler.html),
The AWS SageMaker developer guide titled Object Detection Algorithm (https://docs.aws.amazon.com/sagemaker/latest/dg/object-detection.html)
The most efficient approach to creating a labeling job to build the training dataset for the mobile app is option C: Use SageMaker Ground Truth to label your unlabeled images, leveraging lambda functions to perform annotation consolidation and pre-labeling. Leverage a SageMaker Image Classification algorithm-based model to perform auto-annotation of your images.
SageMaker Ground Truth is a fully managed data labeling service that makes it easy to build highly accurate training datasets for machine learning. It provides a number of features that make it an excellent choice for this task, such as the ability to create custom labeling workflows, automatic data labeling with machine learning models, and human review workflows to ensure high-quality annotations.
In this case, the images of birds are unlabeled, meaning they do not have any annotations indicating what species of bird they depict. To create a training dataset, the images must first be labeled with the appropriate annotations.
Option A suggests using the SageMaker k-means built-in algorithm to label the images. However, k-means is a clustering algorithm that is not suitable for this task. It is designed to group similar data points together, which is not the same as labeling individual images.
Option B suggests using SageMaker Data Wrangler to label the images. While Data Wrangler is a useful tool for cleaning and preparing data, it is not designed for labeling.
Option D suggests using SageMaker Ground Truth to label the images and leveraging Glue ETL jobs to perform annotation consolidation and pre-labeling. While this is a viable approach, it is not as efficient as Option C because Glue ETL jobs require more setup and configuration than Lambda functions.
Option C is the most efficient approach because it uses SageMaker Ground Truth to label the images, leveraging Lambda functions to perform annotation consolidation and pre-labeling. This approach is efficient because Lambda functions are serverless, meaning there is no need to manage or provision servers. This makes it easy to scale the labeling job as needed.
Additionally, leveraging a SageMaker Image Classification algorithm-based model to perform auto-annotation of the images can significantly speed up the labeling process. This is because the model can automatically label images that are similar to those in the training dataset, reducing the amount of manual labeling required.
In summary, option C is the most efficient approach to creating a labeling job to build the training dataset for the mobile app, as it leverages SageMaker Ground Truth, Lambda functions, and a SageMaker Image Classification algorithm-based model to perform auto-annotation.