Amazon Kinesis - Processing Data Across Multiple Shards

Processing Data Across Multiple Shards

Prev Question Next Question

Question

You are developing an application that is going to make use of Amazon Kinesis.

Due to the high throughput, you decide to have multiple shards for the streams.

Which of the following is TRUE when it comes to processing data across multiple shards?

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

Answer - A.

Kinesis Data Streams lets you order records and read and replay records in the same order to many Kinesis Data Streams applications.

To enable write ordering, Kinesis Data Streams expects you to call the PutRecord API to write serially to a shard while using the sequenceNumberForOrdering parameter.

Setting this parameter guarantees strictly increasing of sequence numbers for puts from the same client and to the same partition key.

Option A is correct as it cannot guarantee the ordering of records across multiple shards.

Option B, C and D are incorrect becauseKinesis Data Streams can order records on a single shard.

Each data record has a sequence number that is unique within its shard.

Kinesis Data Streams assigns the sequence number after you write to the stream with putRecords or client.putRecord.

For more information, please refer to:

https://aws.amazon.com/blogs/database/how-to-perform-ordered-data-replication-between-applications-by-using-amazon-dynamodb-streams/ https://docs.aws.amazon.com/streams/latest/dev/key-concepts.html

The correct answer is A. You cannot guarantee the order of data across multiple shards. It's possible only within a shard.

Amazon Kinesis is a managed service that allows you to collect, process, and analyze real-time streaming data at a large scale. Kinesis streams are made up of one or more shards. Each shard is a fixed unit of capacity that can ingest data records at a certain rate and store data records up to a certain size.

When processing data across multiple shards in a Kinesis stream, it's important to keep in mind that data ordering is only guaranteed within a shard. This means that records ingested into a shard will be processed in the order they were received, and records ingested into different shards will be processed in parallel, which may result in records being processed out of order.

To ensure that records are processed in order across multiple shards, you need to include a sequence number in each record and use this sequence number to reorder the records during processing. Alternatively, you can use a partition key to ensure that records with the same key are always processed by the same shard, which can help maintain ordering.

It's worth noting that Kinesis Firehose is a different service that is used to deliver data from Kinesis streams to destinations such as S3, Redshift, or Elasticsearch. Firehose does not support ordering guarantees, and records delivered to destinations may be out of order. If ordering is important for your use case, you should use Kinesis Streams instead of Firehose.