Improving Performance of Hot Shards - AWS Certified Big Data Specialty Exam - BDS-C00

Basic Re-sharding Strategy for Hot Shards

Question

HikeHills.com (HH) is an online specialty retailer that sells clothing and outdoor refreshment gear for trekking, go camping, boulevard biking, mountain biking, rock hiking, ice mountaineering, skiing, avalanche protection, snowboarding, fly fishing, kayaking, rafting, road and trace running, and many more. HH runs its entire online infrastructure on java based web applications running on AWS.

The HH is capturing click stream data and uses a custom-build recommendation engine to recommend products which eventually improve sales, understand customer preferences and already using AWS Kinesis Producer Library to collect events and transaction logs and process the stream. HH IT team identified a lot of performance issues with the Kinesis Stream and based on the metrics captured, identified hot and cold shards.IT team wants to effectively improve the performance of the hot shards.

There are 2 hot shards SHARD 1 with a hash key range of 276...381 and SHARD 2 with a hash key range of 382...454

What basic re-sharding strategy needs to be applied and how can it be applied?

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

Answer: B.

Option A is incorrect.

Shards cannot overlap each other shard.

Besides, The performance of hot shards improves when the hot shards are split into 2 shards.

Considering the following case, where SHARD 1 has a hash key range of 276..381 and SHARD 2with a hash key range of 382..410

when split, shards cannot overlap each other shard.

In this option, SHARD 1A is with hash keys 276.

.332 and SHARD 1B with hash keys 332…381, SHARD2 into SHARD2A between 382..410and SHARD B between 410..454

There is a clear overlap shown.

https://docs.aws.amazon.com/streams/latest/dev/kinesis-using-sdk-java-resharding-split.html

Option B is correct.

The performance of hot shards improves when the hot shards are split into 2 shards.

Besides, there is no overlap of shards after the split.

SHARD 1 need to be split as SHARD 1A with hash keys between 276.

.332 and SHARD 1B between 333…381, SHARD2 into SHARD2A between 382..410and SHARD B between 411..454.

This is a perfect illustration.

https://docs.aws.amazon.com/streams/latest/dev/kinesis-using-sdk-java-resharding-split.html

Option C is incorrect.

Merge of shards does not improve performance but helps with costs.

This resharding strategy helps with cold shards.

Merging 3 shards happens after SHARD 1 and SHARD 2 are merged and the result with SHARD 3

This may be an alternate option but need to see how much performance the split brings on.

definitely not a first resharding strategy to consider.

https://docs.aws.amazon.com/streams/latest/dev/kinesis-using-sdk-java-resharding-split.html

Option D is incorrect.

Merge of shards does not improve performance but helps with costs.

this resharding strategy can be applied for cold shards.

https://docs.aws.amazon.com/streams/latest/dev/kinesis-using-sdk-java-resharding-merge.html

The correct answer is B.

Explanation:

In this scenario, the HH IT team has identified hot and cold shards in their Kinesis Stream, and they want to improve the performance of the hot shards. To do so, they need to apply a re-sharding strategy that will distribute the load more evenly across the stream.

There are two hot shards that have been identified, SHARD 1 and SHARD 2. SHARD 1 has a hash key range of 276...381, and SHARD 2 has a hash key range of 382...454. The goal is to split these shards into smaller shards to improve the performance of the hot shards.

Option A suggests splitting SHARD 1 into SHARD 1A with hash keys between 276...332 and SHARD 1B between 332...381, and splitting SHARD 2 into SHARD 2A between 382...410 and SHARD 2B between 410...454. However, this strategy does not evenly distribute the load between the shards, as SHARD 2B is larger than SHARD 2A.

Option C suggests merging SHARD 1 and SHARD 2 and then splitting them into three shards. However, this strategy is not necessary since the goal is to improve the performance of the hot shards, not to restructure the entire stream.

Option D suggests merging the two hot shards into one shard. However, this strategy will not improve the performance of the hot shards, as the load will still be concentrated in one shard.

Option B suggests splitting SHARD 1 into SHARD 1A with hash keys between 276...332 and SHARD 1B between 333...381, and splitting SHARD 2 into SHARD 2A between 382...410 and SHARD 2B between 411...454. This strategy evenly distributes the load between the shards, and ensures that each shard has a similar number of hash keys. This will improve the performance of the hot shards, as the load will be distributed more evenly.

In conclusion, the correct re-sharding strategy to apply in this scenario is Option B.