URetail, a leading local retail chain works with more than 200 suppliers to procure their products and sell in the market.
The suppliers share the price listing of the products in
This information is evaluated, compared and processed to finalize the orders using data pipeline, uses EMR and send notifications to relevant suppliers.How can the following ingestion mechanism be established? Capture of data in files and standardize the data into JSON before loading into kinesis data streams Ingestion of data in kinesis data streams into Redshift Select 2 options.
Click on the arrows to vote for the correct answer
A. B. C. D. E.Answer : D, E.
Option A is incorrect - KPL library cannot be used to capture files.
The Kinesis Producer Library (KPL) simplifiesproducer application development, allowing developers to achieve high write throughput to a Kinesis data stream.
The KPL can incur an additional processing delay of up to RecordMaxBufferedTime within the library (user-configurable)
Larger values of RecordMaxBufferedTime results in higher packing efficiencies and better performance.
https://docs.aws.amazon.com/streams/latest/dev/developing-producers-with-kpl.htmlOption B is incorrect - This is a pre-built library that helps you easily build Amazon Kinesis Applications for reading and processing data from an Amazon Kinesis stream.
This library handles complex issues such as adapting to changes in stream volume, load-balancing streaming data, coordinating distributed services, and processing data with fault-tolerance, enabling you to focus on business logic while building applications.
This needs Kinesis Connector library to integrate with other AWS services.
Kinesis Connector library is a pre-built library that helps you easily integrate Amazon Kinesis Data Streams with other AWS services and third-party tools.
Amazon Kinesis Client Library (KCL) is required for using this library.
The current version of this library provides connectors to Amazon DynamoDB, Amazon Redshift, Amazon S3, and Elasticsearch.
https://aws.amazon.com/kinesis/data-streams/resources/Option C is incorrect - COPY command is used to copy the data into Redshift from DynamoDB or S3
Loads data into a Redshift table from data files or from an Amazon DynamoDB table.
The files can be located in an Amazon Simple Storage Service (Amazon S3) bucket, an Amazon EMR cluster, or a remote host that is accessed using a Secure Shell (SSH) connection.
https://docs.aws.amazon.com/redshift/latest/dg/r_COPY.htmlOption D is correct - This is a pre-built library that helps you easily integrate Amazon Kinesis Data Streams with other AWS services and third-party tools.
Amazon Kinesis Client Library (KCL) is required for using this library.
The current version of this library provides connectors to Amazon DynamoDB, Amazon Redshift, Amazon S3, and Elasticsearch.
https://aws.amazon.com/kinesis/data-streams/resources/Option E is correct - Kinesis Agent is a stand-alone Java software application that offers an easy way to collect and send data to Kinesis Data Streams.
The agent continuously monitors a set of files and sends new data to your stream.
The agent handles file rotation, check pointing, and retry upon failures.
It delivers all of your data in a reliable, timely, and simple manner.
It also emits Amazon CloudWatch metrics to help you better monitor and troubleshoot the streaming process.
Configure the agent to monitor multiple file directories and send data to multiple streams.
The agent can pre-process the records parsed from monitored files before sending them to your stream.
https://docs.aws.amazon.com/streams/latest/dev/writing-with-agents.html#sim-writesTo establish the ingestion mechanism described in the question, we need to capture the data from the CSV files shared by the suppliers in real-time and standardize it before loading it into Kinesis Data Streams. Then, we need to load this data from Kinesis Data Streams into Redshift for further processing.
To achieve this, we can use the following options:
Option A: KPL Library The Kinesis Producer Library (KPL) is used to capture data from various sources and publish it to Kinesis Data Streams. It provides a simple API for capturing and publishing data to the streams. However, KPL does not provide any functionality for standardizing the data into JSON format.
Option B: KCL Library can load the data into Redshift The Kinesis Client Library (KCL) is used to consume data from Kinesis Data Streams and process it. It can be used to load data into Redshift using the COPY command. KCL provides an easy-to-use API for consuming data from Kinesis Data Streams and processing it using a worker application.
Option C: COPY Command The COPY command is used to load data into Redshift from various data sources. We can use the COPY command to load data from Kinesis Data Streams into Redshift. However, we need to standardize the data into JSON format before loading it into Kinesis Data Streams.
Option D: Kinesis Connector Library The Kinesis Connector Library is used to integrate Kinesis Data Streams with various AWS services. It provides pre-built connectors for various services, including Redshift. We can use the Kinesis Connector Library to load data from Kinesis Data Streams into Redshift.
Option E: Kinesis Agent The Kinesis Agent is a pre-built Java application that is used to capture and send data from various sources to Kinesis Data Streams. It can be used to capture data from the CSV files shared by the suppliers, standardize it into JSON format, and send it to Kinesis Data Streams for further processing. However, it does not provide any functionality for loading data into Redshift.
Therefore, the two options that we can use to establish the ingestion mechanism described in the question are: