An architecture is being considered which would consist of several EC2 Instances hosting a data ingestion application.
The application would receive thousands of events per second from various IoT devices.
The data from these devices need to be streamed for real time analytics.
Which of the following would be the ideal way to ingest the data ensuring high throughput of data of ingestion?
Click on the arrows to vote for the correct answer
A. B. C. D.Answer - B.
The AWS Documentation mentions the following.
The KPL can help build high-performance producers.
Consider a situation where your Amazon EC2 instances serve as a proxy for collecting 100-byte events from hundreds or thousands of low power devices and writing records into a Kinesis data stream.
These EC2 instances must each write thousands of events per second to your data stream.
To achieve the throughput needed, producers must implement complicated logic, such as batching or multithreading, in addition to retry logic and record de-aggregation at the consumer side.
The KPL performs all of these tasks for you.
Option A is invalid since the Kinesis Producer Library would be more efficient that using the Kinesis API.
Option C is invalid since this library is used for consuming records.
Option D is invalid since Kinesis needs to be used for real time ingestion of data.
For more information on the Kinesis Producer Library, please refer to the below URL.
https://docs.aws.amazon.com/streams/latest/dev/developing-producers-with-kpl.htmlIngesting large amounts of data in real-time requires a reliable and scalable solution. AWS offers several services for real-time data ingestion, but for this scenario, the best solution would be to use Amazon Kinesis.
Amazon Kinesis is a fully-managed service that can ingest, buffer, and process streaming data in real-time. It is designed to handle large volumes of data, making it ideal for scenarios with thousands of events per second from various IoT devices.
In terms of the ingestion method, there are three Kinesis API libraries available: Kinesis API, Kinesis KPL, and Kinesis KCL.
In this scenario, we need to ensure high throughput of data ingestion. Therefore, the best option would be to use the Kinesis KPL library. This library provides an efficient and reliable way to ingest data into a Kinesis stream, thanks to its ability to batch records and compress data. Additionally, it automatically handles retries in case of failure, making it a more robust solution compared to the basic Kinesis API.
Redshift, on the other hand, is a data warehousing service that is designed to store and analyze large amounts of structured data. While it can ingest data, it is not designed for real-time data ingestion and may not be suitable for scenarios with high volumes of data that need to be processed in real-time.
In conclusion, the ideal way to ingest the data ensuring high throughput of data ingestion would be to ensure that the application implements the Kinesis KPL library for ingestion of calls.