AWS Certified Machine Learning - Specialty | Expedited Risk Data Engineering with AWS Services

Expedited Risk Data Engineering with AWS Services

Question

You are a machine learning specialist at a financial services company.

Your team has recently been assigned a project to prepare financial risk data and use it in a risk management machine learning model.

The project is on an expedited schedule.

So you need to produce your engineered data as quickly as possible. Which AWS service(s) will allow you to engineer your risk data as expeditiously as possible?

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

Answer: D.

Option A is incorrect.

You could use SageMaker Studio to perform your data engineering tasks.

But more of the infrastructure and coding work would have to be done by you and your team when compared to using SageMaker Processing.

Option B is incorrect.

SageMaker Augmented AI is used to leverage human review of low confidence predictions.

It wouldn't help your team expedite your data engineering work.

Option C is incorrect.

Deep Learning Containers are a set of Docker images used for training and serving models in TensorFlow, PyTorch, and Apache MXNet.

Deep Learning Containers wouldn't help your team expedite your data engineering work.

Option D is CORRECT.

SageMaker Processing is an AWS managed service that you can use to run data engineering workloads in SageMaker using simple SageMaker Processing APIs.

SageMaker Processing manages your SageMaker environment for you in a processing container.

This managed service removes much of the infrastructure and coding work need to perform data engineering tasks.

Reference:

Please see the Amazon SageMaker developer guide titled Process Data and Evaluate Models.

Please see the Amazon SageMaker developer guide titled Using Amazon Augmented AI for Human Review.

Please see the Amazon SageMaker developer guide titled Amazon SageMaker Studio.

Please see the GitHub repository titled Amazon SageMaker Processing jobs.

Please see the AWS Deep Learning Containers development guide titled What are AWS Deep Learning Containers?

The correct answer is D. SageMaker Processing.

Explanation: To engineer the risk data, the team needs to perform various data preprocessing and feature engineering tasks such as cleaning, scaling, transforming, and selecting the relevant features. AWS SageMaker Processing is a fully-managed service that allows data scientists and machine learning engineers to perform data preprocessing and feature engineering at scale. SageMaker Processing provides a distributed and scalable infrastructure that enables you to process large datasets and produce engineered data sets quickly.

SageMaker Processing supports various data processing frameworks such as Apache Spark, TensorFlow, and Scikit-learn, which can be used to perform complex data processing tasks. You can run these frameworks on managed processing clusters with pre-configured software environments or on custom processing clusters with your own software environment.

Using SageMaker Processing, you can perform data preprocessing and feature engineering tasks in parallel and at scale, allowing you to produce engineered data quickly. SageMaker Processing integrates with other AWS services such as S3, SageMaker Studio, and SageMaker Training to provide an end-to-end machine learning workflow.

SageMaker Studio (A) is an integrated development environment for machine learning, which provides a web-based interface for building, training, and deploying machine learning models. While it provides access to SageMaker Processing and other AWS services, it is not specifically designed for data preprocessing and feature engineering.

SageMaker Augmented AI (B) is a service that allows you to build and manage human review workflows for machine learning predictions. It is not designed for data preprocessing and feature engineering.

Deep Learning Containers (C) is a service that provides pre-configured Docker containers for deep learning frameworks such as TensorFlow, PyTorch, and MXNet. While it can be used for training and inference tasks, it is not specifically designed for data preprocessing and feature engineering.