You are developing an Azure Cosmos DB solution by using the Azure Cosmos DB SQL API.
The data includes millions of documents.
Each document may contain hundreds of properties.
The properties of the documents do not contain distinct values for partitioning.
Azure Cosmos DB must scale individual containers in the database to meet the performance needs of the application by spreading the workload evenly across all partitions over time.
You need to select a partition key.
Which two partition keys can you use? Each correct answer presents a complete solution.
NOTE: Each correct selection is worth one point.
Click on the arrows to vote for the correct answer
A. B. C. D. E.DE.
You can form a partition key by concatenating multiple property values into a single artificial partitionKey property.
These keys are referred to as synthetic keys.
Another possible strategy to distribute the workload more evenly is to append a random number at the end of the partition key value.
When you distribute items in this way, you can perform parallel write operations across partitions.
Note: It's the best practice to have a partition key with many distinct values, such as hundreds or thousands.
The goal is to distribute your data and workload evenly across the items associated with these partition key values.
If such a property doesn't exist in your data, you can construct a synthetic partition key.
https://docs.microsoft.com/en-us/azure/cosmos-db/synthetic-partition-keysWhen selecting a partition key for an Azure Cosmos DB container, it is important to choose a value that can distribute data evenly across all partitions, allowing for optimal performance and scalability. In this scenario, there are millions of documents, and each document may contain hundreds of properties, but the properties do not contain distinct values for partitioning.
Option A: Choosing a single property value that does not appear frequently in the documents may not be a good partition key as it could result in data skew, where some partitions have significantly more data than others. This would lead to uneven workload distribution and potential performance issues.
Option B: Using the collection name as the partition key can also lead to data skew, as documents with the same collection name may be concentrated in a few partitions, resulting in uneven workload distribution and potentially poor performance.
Option C: Selecting a single property value that appears frequently in the documents could be a good partition key if it results in even data distribution across all partitions. If a property value appears frequently in the documents, it is likely that the values will be spread out evenly across all partitions, leading to even workload distribution and better performance.
Option D: Concatenating multiple property values with a random suffix appended could also result in even data distribution across all partitions if the values are spread out evenly across all documents. However, if some property values are more common than others, this approach could result in data skew.
Option E: Appending a hash suffix to a property value can be an effective partition key if it results in even data distribution across all partitions. Hash functions distribute data randomly, which can help ensure even data distribution across partitions. However, this approach could also result in data skew if some property values are more common than others.
Overall, options C and E are the most likely to result in even data distribution and better performance. However, the best partition key will ultimately depend on the specific data and access patterns of the application, and may require testing and iteration to find the optimal solution.