AWS Certified Big Data - Specialty Exam: Relevant Artifacts for Real-time Order Processing with Streaming Capability

Artifacts for Real-time Order Processing with Streaming Capability

Question

DailyMerc, an online Retail company uses Streaming capability to process the order real-time without any buffering or processing delays.

This information is captured and relevant discounts are applied for specific products using Kinesis Analytics SQL Application, update the same stream and the data is loaded into DynamoDB.This information is further used to generate invoices, notifications, etc.

The discounts file is stored in a S3 bucket and is updated every 15 mins.

Please identify the relevant artifacts to achieve the above requirements.

Select 3 options.

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D. E. F. G.

Answer : A,D and F.

Option A is correct - Kinesis Data Streams is the right platform to fulfil the requirements since Kinesis Data Streams provides real-time data ingestion while using Streams API.

There is no processing delay or buffering.

Besides Kinesis Data Streams to collect and process large streams of data records in real time.

You can create data-processing applications, known as Kinesis Data Streams applications.

A typical Kinesis Data Streams application reads data from a data stream as data records.

These applications can use the Kinesis Client Library, and they can run on Amazon EC2 instances.

se Kinesis Data Streams for rapid and continuous data intake and aggregation.

The type of data used can include IT infrastructure log data, application logs, social media, market data feeds, and web clickstream data.

Because the response time for the data intake and processing is in real time, the processing is typically lightweight.

The following are typical scenarios for using Kinesis Data Streams:

Accelerated log and data feed intake and processing.

Real-time metrics and reporting.

Real-time data analytics.

Complex stream processing.

https://docs.aws.amazon.com/streams/latest/dev/introduction.html

Option B is incorrect - Amazon Kinesis Data Firehose is a fully managed service for delivering real-time streaming data to destinations such as Amazon Simple Storage Service (Amazon S3), Amazon Redshift, Amazon Elasticsearch Service (Amazon ES), and Splunk.

Kinesis Data Firehose can invoke your Lambda function to transform incoming source data and deliver the transformed data to destinations.

You can enable Kinesis Data Firehose data transformation when you create your delivery stream.

Amazon Kinesis Data Firehose can convert the format of your input data from JSON to Apache Parquet or Apache ORC before storing the data in Amazon S3

Parquet and ORC are columnar data formats that save space and enable faster queries compared to row-oriented formats like JSON.

If you want to convert an input format other than JSON, such as comma-separated values (CSV) or structured text, you can use AWS Lambda to transform it to JSON first.

https://docs.aws.amazon.com/firehose/latest/dev/record-format-conversion.html

Option C is incorrect - The KPL can incur an additional processing delay of up to RecordMaxBufferedTime within the library (user-configurable)

Larger values of RecordMaxBufferedTime results in higher packing efficiencies and better performance.

Applications that cannot tolerate this additional delay may need to use the AWS SDK directly.

https://docs.aws.amazon.com/streams/latest/dev/developing-producers-with-kpl.html#developing-producers-with-kpl-when

Option D is correct - Streams API is the right mechanism to ingest data into stream.

Once a stream is created, you can add data to it in the form of records.

A record is a data structure that contains the data to be processed in the form of a data blob.

After you store the data in the record, Kinesis Data Streams does not inspect, interpret, or change the data in any way.

Each record also has an associated sequence number and partition key.

https://docs.aws.amazon.com/streams/latest/dev/developing-producers-with-sdk.html

Option E is incorrect - Discounts file is a Reference source, while Order data is asStreaming source in Kinesis Analytics Application.

Your Amazon Kinesis Data Analytics application can receive input from a single streaming source and, optionally, use one reference data source.

At the time that you create an application, you specify a streaming source.

You can also modify an input after you create the application.

Amazon Kinesis Data Analytics supports the following streaming sources for your application:

A Kinesis data stream.

A Kinesis Data Firehose delivery stream.

Kinesis Data Analytics continuously polls the streaming source for new data and ingests it in in-application streams according to the input configuration.

Your application code can query the in-application stream.

Add a reference data source to an existing application to enrich the data coming in from streaming sources.

You must store reference data as an object in your Amazon S3 bucket.

When the application starts, Amazon Kinesis Data Analytics reads the Amazon S3 object and creates an in-application reference table.

Your application code can then join it with an in-application stream.

You store reference data in the Amazon S3 object using supported formats (CSV, JSON).

https://docs.aws.amazon.com/kinesisanalytics/latest/dev/how-it-works-input.html#source-streaming https://docs.aws.amazon.com/kinesisanalytics/latest/dev/how-it-works-input.html#source-reference

Option F is correct - Discounts file is a Reference source, while Order data is asStreaming source in Kinesis Analytics Application.

Your Amazon Kinesis Data Analytics application can receive input from a single streaming source and, optionally, use one reference data source.

At the time that you create an application, you specify a streaming source.

You can also.

modify an input after you create the application.

Amazon Kinesis Data Analytics supports the following streaming sources for your application:

A Kinesis data stream.

A Kinesis Data Firehose delivery stream.

Kinesis Data Analytics continuously polls the streaming source for new data and ingests it in in-application streams according to the input configuration.

Your application code can query the in-application stream.

Add a reference data source to an existing application to enrich the data coming in from streaming sources.

You must store reference data as an object in your Amazon S3 bucket.

When the application starts, Amazon Kinesis Data Analytics reads the Amazon S3 object and creates an in-application reference table.

Your application code can then join it with an in-application stream.

You store reference data in the Amazon S3 object using supported formats (CSV, JSON).

https://docs.aws.amazon.com/kinesisanalytics/latest/dev/how-it-works-input.html#source-streaming https://docs.aws.amazon.com/kinesisanalytics/latest/dev/how-it-works-input.html#source-reference

Option G is incorrect - Discounts file is a Reference source, while Order data is asStreaming source in Kinesis Analytics Application.

Your Amazon Kinesis Data Analytics application can receive input from a single streaming source and, optionally, use one reference data source.

At the time that you create an application, you specify a streaming source.

You can also modify an input after you create the application.

Amazon Kinesis Data Analytics supports the following streaming sources for your application:

A Kinesis data stream.

A Kinesis Data Firehose delivery stream.

Kinesis Data Analytics continuously polls the streaming source for new data and ingests it in in-application streams according to the input configuration.

Your application code can query the in-application stream.

Add a reference data source to an existing application to enrich the data coming in from streaming sources.

You must store reference data as an object in your Amazon S3 bucket.

When the application starts, Amazon Kinesis Data Analytics reads the Amazon S3 object and creates an in-application reference table.

Your application code can then join it with an in-application stream.

You store reference data in the Amazon S3 object using supported formats (CSV, JSON).

https://docs.aws.amazon.com/kinesisanalytics/latest/dev/how-it-works-input.html#source-streaming https://docs.aws.amazon.com/kinesisanalytics/latest/dev/how-it-works-input.html#source-reference

Option H is incorrect - Discounts file is a Reference source, while Order data is asStreaming source in Kinesis Analytics Application.

Your Amazon Kinesis Data Analytics application can receive input from a single streaming source and, optionally, use one reference data source.

At the time that you create an application, you specify a streaming source.

You can also modify an input after you create the application.

Amazon Kinesis Data Analytics supports the following streaming sources for your application:

A Kinesis data stream.

A Kinesis Data Firehose delivery stream.

Kinesis Data Analytics continuously polls the streaming source for new data and ingests it in in-application streams according to the input configuration.

Your application code can query the in-application stream.

Add a reference data source to an existing application to enrich the data coming in from streaming sources.

You must store reference data as an object in your Amazon S3 bucket.

When the application starts, Amazon Kinesis Data Analytics reads the Amazon S3 object and creates an in-application reference table.

Your application code can then join it with an in-application stream.

You store reference data in the Amazon S3 object using supported formats (CSV, JSON).

https://docs.aws.amazon.com/kinesisanalytics/latest/dev/how-it-works-input.html#source-streaming https://docs.aws.amazon.com/kinesisanalytics/latest/dev/how-it-works-input.html#source-reference

The scenario described in the question involves processing real-time order data without any buffering or processing delays. To achieve this, DailyMerc uses Streaming capability to capture the order data and relevant discounts are applied using Kinesis Analytics SQL Application. The updated data is then loaded into DynamoDB, which is further used to generate invoices, notifications, etc. The discounts file is stored in an S3 bucket and updated every 15 mins.

To achieve the above requirements, the following artifacts can be used:

A. Kinesis Data Streams provides Streaming platform Kinesis Data Streams is a fully managed service provided by AWS that can be used to collect and process large streams of data in real-time. It allows multiple applications to consume the same data stream simultaneously. In this scenario, Kinesis Data Streams can be used to capture and process the order data in real-time.

B. Kinesis Data Firehose provides Streaming platform Kinesis Data Firehose is a fully managed service provided by AWS that can be used to capture, transform, and load streaming data into data stores such as S3, Redshift, and Elasticsearch. In this scenario, Kinesis Data Firehose can be used to load the updated data into DynamoDB after the discounts are applied using Kinesis Analytics SQL Application.

C. KPL to provide data ingestion mechanism The Kinesis Producer Library (KPL) is a library provided by AWS that can be used to build custom producers for Kinesis Data Streams. It allows developers to control how data is sent to Kinesis Data Streams and provides advanced features such as batching, encryption, and checksumming. In this scenario, KPL can be used as a data ingestion mechanism to send the order data to Kinesis Data Streams.

D. Streams API to provide data ingestion mechanism The Streams API is a low-level API provided by AWS that can be used to send data to Kinesis Data Streams. It provides finer-grained control over the data sent to Kinesis Data Streams than KPL but requires more code to implement. In this scenario, the Streams API can be used as a data ingestion mechanism to send the order data to Kinesis Data Streams.

E. Discounts file is a Streaming source, while Order data is a Reference source in Kinesis Analytics Kinesis Analytics is a fully managed service provided by AWS that can be used to analyze streaming data using SQL queries. In this scenario, the discounts file stored in an S3 bucket can be used as a Streaming source, while the order data captured by Kinesis Data Streams can be used as a Reference source in Kinesis Analytics.

F. Discounts file is a Reference source, while Order data is a Streaming source in Kinesis Analytics This option is not correct because it incorrectly identifies the discounts file as a Reference source instead of a Streaming source.

G. Both Discounts file and Order data are considered as Streaming sources This option is correct because both the discounts file stored in an S3 bucket and the order data captured by Kinesis Data Streams can be considered as Streaming sources in Kinesis Analytics.

H. Both Discounts file and Order data are considered as Reference sources This option is not correct because it incorrectly identifies both the discounts file and order data as Reference sources, whereas the discounts file can be considered a Streaming source in Kinesis Analytics.

In summary, the three relevant artifacts to achieve the requirements described in the scenario are:

  • Kinesis Data Streams
  • Kinesis Data Firehose
  • Both the discounts file stored in an S3 bucket and the order data captured by Kinesis Data Streams can be considered as Streaming sources in Kinesis Analytics.