Sharding Strategies for Quick Data Retrieval: A Guide for Microsoft Azure Exam DP-203

Sharding Strategies for Quick Data Retrieval

Question

There is an application that frequently needs to find all the orders delivered in a particular month.

Which of the following sharding strategies would you implement to divide the data store to enable quick data retrieval?

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

Correct Answer: B

Range strategy results in grouping the related items together in the same shard, and ordering them by shard key.

In the case of the application that frequently requires finding all orders delivered in a particular month, the data could be quickly retrieved/accessed if all orders associated with a month are stored in time and date order in the same shard.

Option A is incorrect.

Lookup strategy is not the best sharding strategy in the given scenario.

Option B is correct.

range strategy will put all the related orders (i.e.

orders for a month) in the same shard which will result in quick data retrieval.

Option C is incorrect.

Hash strategy distributes the data across the shards to achieve a balance among the size of every shard and the average load to be encountered by each shard.

Option D is incorrect.

There is no such sharding strategy as a normalized strategy.

To know more about sharding patterns, please visit the below-given link:

To enable quick data retrieval for finding orders delivered in a particular month, we need to shard the data store. Sharding is a technique to horizontally partition a large database into smaller and more manageable parts.

Now, let's discuss each of the given sharding strategies and see which one would be appropriate for this use case:

A. Lookup strategy: In this strategy, we assign a lookup value to each data record, which is then used to locate the record quickly. This strategy is useful when the lookup value has a low cardinality, and we need to search based on that value. However, for this use case, we need to search based on the delivery date, which has a high cardinality (i.e., there could be multiple orders delivered on the same day). So, the lookup strategy may not be the best fit for this use case.

B. Range strategy: In this strategy, we divide the data into ranges based on some criteria (e.g., date range, alphabetical range, etc.). This strategy works well when we need to search based on a range of values. For this use case, we can shard the data based on the delivery date range (e.g., orders delivered between 1st and 15th of the month in one shard and orders delivered between 16th and 31st of the month in another shard). This would enable us to quickly retrieve the orders delivered in a particular month using a range query. Therefore, the range strategy could be a good fit for this use case.

C. Hash strategy: In this strategy, we use a hash function to generate a hash value for each data record, which is then used to determine the shard where the record should be stored. This strategy works well when we need to distribute the data uniformly across shards, and we don't have a natural way to divide the data. However, for this use case, we have a natural way to divide the data based on the delivery date, so the hash strategy may not be the best fit.

D. Normalized strategy: In this strategy, we normalize the data by breaking it down into smaller tables and removing redundant data. This strategy is not relevant for this use case as it doesn't help with sharding.

In summary, the most appropriate sharding strategy for this use case would be the range strategy.