Data Preprocessing for AWS Certified Developer - Associate Exam | YourSite

Data Preprocessing

Prev Question Next Question

Question

You are developing an application that will be used to receive data from multiple devices.

You need to perform some preprocessing on the data before it can be analyzed by the Analytics tool.

All the received data are compressed records that need to be decompressed to be analyzed further.

Which of the following can be used to carry out this intermediate activity?

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

Answer - B.

The AWS Documentation mentions the following.

Many customers use Amazon Kinesis to ingest, analyze, and persist their streaming data.One of the easiest ways to gain real-time insights into your streaming data is to use Kinesis Analytics.It enables you to query the data in your stream or build entire streaming applications using SQL.

Customers use Kinesis Analytics for things like filtering, aggregation, and anomaly detection.

A data producer is compressing JSON records before sending them to a Kinesis stream or a Kinesis Firehose delivery stream.

You want to use Kinesis Analytics to analyze these compressed records.Before you can use SQL to perform the analysis, you must first decompress each input record so that it's represented as decompressed JSON.This enables it to map to the schema you've created in the Kinesis Analytics application.

Option A is incorrect since this service is used to coordinate different parts of a distributed application.

Option C is incorrect since this service is used to cache static and dynamic content of a website hosted in AWS Cloud.

Option D is incorrect since CloudFront can not be used to pre-process the data.

For more information on preprocessing data in Kinesis, please refer to the below Link-

https://aws.amazon.com/blogs/big-data/preprocessing-data-in-amazon-kinesis-analytics-with-aws-lambda/

The correct answer is B. Use Kinesis with AWS Lambda functions to pre-process the data.

Explanation:

When dealing with large amounts of data coming in from multiple sources, it is often necessary to preprocess that data before it can be analyzed. In this scenario, the data is compressed and needs to be decompressed before it can be analyzed. AWS provides a variety of services that can be used for this kind of preprocessing, but the most appropriate service in this case is Kinesis with AWS Lambda functions.

Kinesis is a service provided by AWS that is designed for real-time processing of streaming data. Kinesis can be used to receive and process large volumes of data in real-time from multiple sources. AWS Lambda, on the other hand, is a compute service that allows you to run code in response to events, such as the arrival of new data in Kinesis. Lambda can be used to perform data preprocessing tasks such as decompressing data.

Using Kinesis with Lambda functions provides several benefits for this use case:

  1. Scalability: Kinesis is designed to handle large volumes of data, and Lambda can be used to process that data in parallel, making it possible to handle even more data.

  2. Flexibility: Since Lambda functions can be written in many programming languages, you can choose the language that best suits your needs.

  3. Cost-effective: With Kinesis and Lambda, you only pay for what you use. This means that you can process large volumes of data without incurring high costs.

A is incorrect because Step Functions is a service that is used to coordinate the components of distributed applications and workflows using visual workflows. It is not designed for data preprocessing tasks.

C is incorrect because AWS CloudFront is a content delivery network that is used to deliver content to users from edge locations. It is not designed for data preprocessing tasks.

D is incorrect because Elastic Load Balancing (ELB) is a service that distributes incoming network traffic across multiple targets, such as EC2 instances. It is not designed for data preprocessing tasks.