You are performing a semi-annual capacity planning exercise for your flagship service.
You expect a service user growth rate of 10% month-over-month over the next six months.
Your service is fully containerized and runs on Google Cloud Platform (GCP), using a Google Kubernetes Engine (GKE) Standard regional cluster on three zones with cluster autoscaler enabled.
You currently consume about 30% of your total deployed CPU capacity, and you require resilience against the failure of a zone.
You want to ensure that your users experience minimal negative impact as a result of this growth or as a result of zone failure, while avoiding unnecessary costs.
How should you prepare to handle the predicted growth?
Click on the arrows to vote for the correct answer
A. B. C. D.B.
The correct answer is D. Proactively add 60% more node capacity to account for six months of 10% growth rate, and then perform a load test to make sure you have enough capacity.
Explanation:
Capacity planning is a critical exercise for any service to ensure it can handle the expected growth in users without impacting the user experience negatively. The goal is to balance the need for capacity with the cost of provisioning resources, while ensuring resilience against potential failures.
In this scenario, the service is fully containerized and runs on Google Cloud Platform (GCP), using a Google Kubernetes Engine (GKE) Standard regional cluster on three zones with cluster autoscaler enabled. The current CPU capacity usage is at 30%, and the expected growth rate is 10% month-over-month over the next six months. The service requires resilience against zone failure, and the user experience should not be negatively impacted.
Option A suggests verifying the maximum node pool size, enabling a horizontal pod autoscaler, and performing a load test to verify expected resource needs. While this may help in ensuring that the service has sufficient capacity, it does not take into account the expected growth rate or resilience against zone failure.
Option B suggests that because the service is deployed on GKE and is using a cluster autoscaler, the cluster will automatically scale regardless of growth rate. While this is partially true, it does not account for the resilience against zone failure.
Option C suggests that because the current CPU capacity usage is at 30%, significant headroom exists, and additional capacity is not needed for the expected growth rate. While this may be true, it does not take into account the resilience against zone failure.
Option D suggests proactively adding 60% more node capacity to account for six months of 10% growth rate, and then performing a load test to ensure enough capacity. This is the correct answer as it takes into account the expected growth rate, resilience against zone failure, and the need to balance capacity with the cost of provisioning resources.
In conclusion, when performing capacity planning, it is crucial to take into account the expected growth rate, resilience against potential failures, and the need to balance capacity with the cost of provisioning resources.