Clean, Organize, and Transform Ride Data for Machine Learning in AWS | Exam MLS-C01 Preparation

Simplify Data Processing with AWS Services for Ride Sharing Analytics

Prev Question Next Question

Question

You work as a machine learning specialist at a ride sharing software company.

You need to analyze the streaming ride data of your firm's drivers.

First, you need to clean, organize, and transform the drive data and load it into your firm's data lake.

So you can then use the data in your machine learning models in SageMaker.

Which AWS services would give you the simplest solution?

Answers

A. Use Amazon Kinesis Data Streams to capture the streaming ride data. Use Amazon Kinesis Data Analytics to clean, organize, and transform the drive data and then output the data to your S3 data lake using a Lambda function.

B. Use Amazon Kinesis Data Streams to capture the streaming ride data. Have Amazon Kinesis Data Streams trigger a lambda function to clean, organize, and transform the drive data and then output the data to your S3 data lake.

C. Use Amazon Kinesis Data Streams to capture the streaming ride data. Have Kinesis Data Streams stream the data to a set of processing workers running in ECS Fargate. The workers send the data to your S3 data lake.

D. Use Amazon Kinesis Data Firehose to stream the data directly to your S3 data lake.

Show Answer

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

Answer: A.

Option A is correct.

Amazon Kinesis Data Analytics is a very efficient service for taking streams from Amazon Kinesis Data Streams and transforming them with SQL or Apache Flink.

(See the Amazon Kinesis Data Analytics overview)

Option B is incorrect.

Using Lambda to retrieve your ride data from your Kinesis Data Stream and process the data records would require more effort on your part as compared to using Kinesis Data Analytics to do the transformation work.

Option C is incorrect.

Using ECS Fargate as an intermediary between Amazon Kinesis Data Streams and your data lake would require you to write the transformation logic in your ECS workers.

This would not be the simplest solution to the options given.

Option D is incorrect.

This option lacks the transformation aspect of the solution.

Reference:

Please see the Amazon Kinesis Data Analytics documentation, and the AWS Lambda developer guide titled Tutorial: Using AWS Lambda with Amazon Kinesis (https://docs.aws.amazon.com/lambda/latest/dg/with-kinesis-example.html), and the Amazon Kinesis Data Analytics for SQL Applications Developer Guide SQL developer guide titled Using a Lambda Function as Output (https://docs.aws.amazon.com/kinesisanalytics/latest/dev/how-it-works-output-lambda.html)

To clean, organize, and transform the streaming ride data of a ride-sharing software company and load it into the data lake, there are several AWS services that can be used. Among the options given, the simplest solution would be to use Amazon Kinesis Data Firehose (option D). Here's why:

Option A involves using Amazon Kinesis Data Streams to capture the streaming ride data. This data is then processed using Amazon Kinesis Data Analytics, and finally, output to the S3 data lake using a Lambda function. While this is a feasible solution, it requires a lot of moving parts, which makes it more complex than necessary. Additionally, there may be latency issues when using Kinesis Data Analytics, which may not be suitable for real-time data processing.

Option B is similar to option A, but instead of using Kinesis Data Analytics, a Lambda function is used to process the data. This is a slightly simpler solution, but still has some of the same issues as option A, such as potential latency issues and the need to manage multiple AWS services.

Option C involves using Kinesis Data Streams to capture the streaming ride data and sending it to a set of processing workers running in ECS Fargate. These workers then send the data to the S3 data lake. While this is a scalable solution, it is more complex than necessary for the given scenario, and it requires additional management of the ECS Fargate cluster.

Option D is the simplest solution because it involves using Amazon Kinesis Data Firehose to stream the data directly to the S3 data lake. Kinesis Data Firehose can automatically transform the data and load it into the data lake, simplifying the data processing pipeline. This solution is also highly scalable, so it can handle large amounts of streaming data with ease.

Therefore, for the given scenario, option D would be the simplest and most efficient solution.

Prev Question Next Question