"Troubleshooting 'ProvisionedThroughputExceededException' Errors in Kinesis Stream Consumption"

"The Underlying Issue with 'ProvisionedThroughputExceededException' Errors in KCL Application"

Question

Your development team has created separate applications which implement the KPL and KCL library for writing and reading data from Kinesis streams.

The KPL is being used to stream information from thousands of IoT devices.

The KCL application is consuming the records and providing real time analytics to the data science team.

The KCL application has been programmed to poll the Kinesis stream every 150 milliseconds.

During a dry run, the KCL based application is getting a lot of “ProvisionedThroughputExceededException”errors.

Which of the following could be underlying issue?

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

Answer - B.

The AWS Documentation mentions the following.

Because Kinesis Data Streams has a limit of 5 GetRecords calls per second, per shard, setting the idleTimeBetweenReadsInMillisproperty lower than 200ms may result in your application observing the ProvisionedThroughputExceededException exception.

Too many of these exceptions can result in exponential back-offs and thereby cause significant unexpected latencies in processing.

If you set this property to be at or just above 200 ms and have more than one processing application, you will experience similar throttling.

Option A is incorrect because the dry run would not work at all if the Kinesis stream was created in the wrong region.

Option C is incorrect because the polling interval is too short.

Option D is incorrect since you should ideally use the KCL library in conjunction with the KPL library.

For more information on Kinesis low latency, please refer to the below URL.

https://docs.aws.amazon.com/streams/latest/dev/kinesis-low-latency.html

The "ProvisionedThroughputExceededException" error occurs when a Kinesis consumer application exceeds the provisioned read capacity for the stream. The read capacity is specified during the creation of the stream, and it determines the maximum amount of data that can be read from the stream per second.

In this scenario, the KCL-based application is getting this error during a dry run, which means that it is not processing a large amount of data. This suggests that the provisioned read capacity for the stream is not sufficient to handle the current rate of data ingestion from the IoT devices.

Out of the given options, the most likely underlying issue is B, i.e., the polling interval is too short. The KCL-based application is polling the stream every 150 milliseconds, which is a very short interval. This means that the application is requesting records from the stream very frequently, and if the provisioned read capacity is not high enough to handle this rate of requests, it will result in the "ProvisionedThroughputExceededException" error.

Option A, i.e., the Kinesis stream being created in the wrong region, is unlikely to be the issue as the KCL application is able to consume records from the stream. If the stream was created in the wrong region, the KCL application would not be able to connect to the stream in the first place.

Option C, i.e., the polling interval being too long, is unlikely to be the issue as a longer polling interval would mean that the KCL application is requesting records less frequently, which would reduce the load on the stream and decrease the likelihood of the "ProvisionedThroughputExceededException" error.

Option D, i.e., using the Kinesis API for consuming records, is not the issue here as the KCL library is a recommended approach for consuming data from Kinesis streams, and it provides features such as automatic scaling, load balancing, and fault tolerance.

To resolve the issue, the provisioned read capacity for the stream should be increased to handle the current rate of data ingestion, and the polling interval for the KCL-based application should be increased to reduce the load on the stream. The ideal polling interval would depend on the data ingestion rate and the provisioned read capacity for the stream.