Real-time Data Ingestion and Customer Profile Analysis with AWS Kinesis

Enhancing Customer Experience with Real-time Data Analysis

Question

LeeBronz is a trusted online platform for buying and selling the widest range of luxury products hosts LBW, a java web application running on EC2 span over multiple availability zones and regions to serve customers from different locations.

LBW is complemented by ES to support search and hosts the EDW on Redshift.LeeBronz is looking to revitalize the customer online behavior by enhancing search, recommendations, increase transaction value, and also convert browsing users to customers. LeeBronz is looking to capture clickstreams and propose recommendations real-time back to the customers based on existing and previous search and transaction behavior.

The click-stream info, recommendations are finally integrated into Redshift to enhance customer profile and into S3 buckets to support future recommendations.LeeBronz considers Kinesis Data Streams to address data integration, and Kinesis Analytics to address recommendations.

Which of the below options is a correct solution for the requirements listed below: Realtime data ingestion into Kinesis Streams without any processing delays Access customer profile data loaded in S3 buckets through Kinesis Analytics Select 2 options.

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D. E. F.

Answer : A, D.

Option A is correct - Kinesis Streams API is the right solution for data ingestion if the applications cannot tolerate additional delay while data ingestion which happens with KPL library.

Besides Amazon Kinesis Data Streams is a managed service that scales elastically for real-time processing of streaming big data.

https://docs.aws.amazon.com/streams/latest/dev/developing-producers-with-sdk.html

Option B is incorrect - The Kinesis Producer Library (KPL) simplifies producer application development, allowing developers to achieve high write throughput to a Kinesis data stream.

The KPL can incur an additional processing delay of up to RecordMaxBufferedTime within the library (user-configurable)

Larger values of RecordMaxBufferedTime results in higher packing efficiencies and better performance.

https://docs.aws.amazon.com/streams/latest/dev/developing-producers-with-kpl.html

Option C is incorrect - Kinesis Agent offers an easy way to collect and send log data to Kinesis Data Streams.

The agent continuously monitors a set of files and sends new data to your stream.

The agent handles file rotation, check pointing, and retry upon failures.

It delivers all of your data in a reliable, timely, and simple manner.

https://docs.aws.amazon.com/streams/latest/dev/writing-with-agents.html

Option D is correct - When the Kinesis Analytics application starts, Amazon Kinesis Data Analytics reads the Amazon S3 object and creates an in-application reference table.

Add reference data source to an existing application to enrich the data coming in from streaming sources.

You must store reference data as an object in your Amazon S3 bucket.

The application code can then join it with an in-application stream.

https://docs.aws.amazon.com/kinesisanalytics/latest/dev/how-it-works-input.html

Option E is incorrect - Streaming source is an input object in Kinesis Analytics to read a data stream.

Amazon Kinesis Data Analytics supports the following streaming sources for your application:

A Kinesis data stream.

A Kinesis Data Firehose delivery stream.

Amazon Kinesis Data Analytics application can receive input from a single streaming source and, optionally, use one reference data source.

https://docs.aws.amazon.com/kinesisanalytics/latest/dev/how-it-works-input.html

Option F is incorrect - KCL reads data from stream, does not write data into stream.

A consumer is an application that processes all data from a Kinesis data stream.

When a consumer uses enhanced fan-out, it gets its own 2 MiB/sec allotment of read throughput, allowing multiple consumers to read data from the same stream in parallel, without contending for read throughput with other consumers.

https://docs.aws.amazon.com/streams/latest/dev/building-consumers.html

The requirements of LeeBronz include capturing clickstreams, proposing real-time recommendations, and integrating the clickstream and recommendation data into Redshift and S3 buckets for future recommendations.

To achieve these requirements, LeeBronz can use Kinesis Data Streams and Kinesis Analytics. Kinesis Data Streams is a service that enables real-time data ingestion and processing. Kinesis Analytics is a service that enables real-time analysis of streaming data using SQL queries.

To achieve real-time data ingestion into Kinesis Streams without any processing delays, LeeBronz can use either the Kinesis Data Streams API or the Kinesis Producer Library (KPL). Both of these options provide a way to publish data to a Kinesis stream in real-time.

The Kinesis Data Streams API is a simple HTTP API that can be used to publish data to a Kinesis stream. This option is ideal for applications that require low latency and high throughput.

The Kinesis Producer Library (KPL) is a more advanced option that provides additional features such as batching, encryption, and compression. The KPL can also automatically retry failed records and throttle the rate of records sent to a Kinesis stream.

To access customer profile data loaded in S3 buckets through Kinesis Analytics, LeeBronz can use Kinesis Analytics with a reference data source. A reference data source is a static data set that is used as a lookup during real-time data analysis.

To use S3 as a reference data source in Kinesis Analytics, LeeBronz can create a reference data stream that reads from an S3 bucket. The data in the S3 bucket can be in any format supported by Kinesis Analytics, such as CSV or JSON.

To integrate the clickstream and recommendation data into Redshift and S3 buckets for future recommendations, LeeBronz can use Kinesis Data Streams and Kinesis Analytics. LeeBronz can publish the clickstream and recommendation data to a Kinesis stream and use Kinesis Analytics to process the data in real-time.

Kinesis Analytics can then write the processed data to Redshift or S3 buckets for future recommendations. To use S3 as a streaming source in Kinesis Analytics, LeeBronz can create a Kinesis data stream that reads from an S3 bucket.

In summary, the correct options for the requirements listed are:

A. Kinesis Data Streams API or B. Kinesis Producer Library (KPL) for real-time data ingestion into Kinesis Streams D. Kinesis Analytics to access data objects in S3 as reference source E. Kinesis Analytics to access data objects in S3 as streaming source