HikeHills.com (HH) is an online specialty retailer that sells clothing and outdoor refreshment gear for trekking, go camping, boulevard biking, mountain biking, rock hiking, ice mountaineering, skiing, avalanche protection, snowboarding, fly fishing, kayaking, rafting, road and trace running, and many more. HHruns their entire online infrastructure on java based web applications running on AWS.
The HH is capturing click stream data and use custom-build recommendation engine to recommend products which eventually improve sales, understand customer preferences and already using AWS kinesis KPL to collect events and transaction logs and process the stream. HHIT team identified lot of performance issues with the Kinesis Stream and based on the metrics captured, identified hot and cold shards.
IT team wants to effectively remove the unused capacity.
There are 3 shards SHARD 1 with a hash key range of 276...381, SHARD 2 with a hash key range of 382...454 and SHARD 3 with a hash key range of 455...510. Shard 1 and Shard 3 are cold shards while Shard 2 is a hot shard.
What re-sharding strategy needs to be applied and how can it be applied?
Click on the arrows to vote for the correct answer
A. B. C. D.Answer: D.
Merge unused SHARDS into 1
SHARD Adjacency is applicable only for neighbor shards which implies if the union of the hash key ranges for the two shards forms a contiguous set with no gaps
https://docs.aws.amazon.com/streams/latest/dev/kinesis-using-sdk-java-resharding-merge.htmlNote: Reducing shards can never improve performance.
for the current scenario when performance is degrading, multiple strategies are there.
actually in real time, when data load is happening and you start getting cold shards, need to evaluate the workload and redesign or adapt to Redshift strategies that are proposed.
definitely reducing shards spoils the partitioning.
The correct answer is C. MERGE SHARD 1, SHARD2 and SHARD3 into 1 and split shards into 2 shards, SHARD123A with hash keys 276…420, SHARD123B as 425....510.
Explanation:
Kinesis is a managed service provided by AWS that makes it easy to collect, process, and analyze real-time streaming data such as website clickstream data. Kinesis works on a sharded model, which means that the data is partitioned across multiple shards, and each shard has a certain capacity limit. In this case, HikeHills.com is using Kinesis to capture clickstream data and to process events and transaction logs. The IT team has identified performance issues with the Kinesis stream and has identified hot and cold shards based on the metrics captured. Hot shards are those that are heavily utilized, while cold shards are those that are underutilized.
To effectively remove the unused capacity, the IT team needs to apply a re-sharding strategy. The objective of the re-sharding strategy is to redistribute the data across the shards in a way that optimizes the capacity utilization of the Kinesis stream. There are several re-sharding strategies available, but the most appropriate one in this case is to merge the cold shards and split them into two new shards.
Here's how the re-sharding strategy would work:
Merge Shard 1, Shard 2, and Shard 3 into a single shard, let's call it SHARD123. This would reduce the number of shards and make it easier to manage the stream.
Split SHARD123 into two new shards, SHARD123A and SHARD123B. The split should be based on the hash key range of the original shards. SHARD123A should have the hash key range of 276...420, and SHARD123B should have the hash key range of 425...510.
By merging the cold shards and splitting them into two new shards, the IT team can effectively remove the unused capacity and optimize the capacity utilization of the Kinesis stream. This re-sharding strategy would also make it easier to manage the stream and improve performance.