Distributing Data Across Shards in Kinesis Streams

Aspect of Data Record for Shard Distribution

Question

A company is developing an application that will make use of Kinesis streams.

They are developing the producer and consumer components.

They need to ensure that data is distributed across the shards of the streams.

Which of the following aspect of the data record helps achieve this?

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

Answer - C.

The AWS Documentation mentions the following.

The partition key is used by Kinesis Data Streams to distribute data across shards.

Kinesis Data Streams segregates the data records that belong to a stream into multiple shards, using the partition key associated with each data record to determine the shard to which a given data record belongs.

Option A is incorrect since this is used to uniquely identify the record in the shard.

Option B is incorrect since this is internally generated by AWS Kinesis.

Option D is incorrect since this is the data payload which is sent with the streaming data.

For more information on Apache Zeppelin, please refer to the below URL.

https://docs.aws.amazon.com/kinesis/latest/APIReference/API_PutRecord.html

In Amazon Kinesis streams, data is distributed across shards, and each shard has a specific capacity in terms of data throughput. When designing an application that makes use of Kinesis streams, it's important to ensure that data is evenly distributed across all shards to ensure optimal performance and avoid any bottlenecks.

To achieve this, the partition key of a data record is used to determine which shard the record will be written to. The partition key is a string that is used to group data records that belong together, such as data from the same sensor or device.

When writing data to a Kinesis stream, the producer component must specify a partition key for each record. The Kinesis service uses a hash function to convert the partition key into a 128-bit integer value, which is then used to determine the shard the record will be written to.

If multiple records have the same partition key, they will be written to the same shard in the order they were received. This ensures that related data is stored together and can be processed together by the consumer component.

The sequence number is a unique identifier assigned to each data record by Kinesis when it is written to a shard. It is used to track the ordering of records within a shard, but it does not affect which shard a record is written to.

The hash key is a value used to group data in Amazon DynamoDB, and is not relevant to Kinesis streams.

The string blob is a data type used to store large amounts of text or binary data in Amazon S3, and is not relevant to Kinesis streams.

Therefore, the correct answer to the question is C. Partition key.