Diagnosing Performance Issue in Compute Engine: Recommendations

Solving Performance Issues in Compute Engine Instances

Question

Your operations team has asked you to help diagnose a performance issue in a production application that runs on Compute Engine.

The application is dropping requests that reach it when under heavy load.

The process list for affected instances shows a single application process that is consuming all available CPU, and autoscaling has reached the upper limit of instances.

There is no abnormal load on any other related systems, including the database.

You want to allow production traffic to be served again as quickly as possible.

Which action should you recommend?

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

A.

https://cloud.google.com/blog/products/sap-google-cloud/best-practices-for-sap-app-server-autoscaling-on-google-cloud

Given the scenario, the application running on Compute Engine is dropping requests when under heavy load, and the process list for affected instances shows a single application process that is consuming all available CPU. Also, autoscaling has reached the upper limit of instances, and there is no abnormal load on any other related systems. To allow production traffic to be served again as quickly as possible, the following action should be recommended:

Option A. Change the autoscaling metric to agent.googleapis.com/memory/percent_used.

This option is not recommended because changing the autoscaling metric to memory usage would not directly address the root cause of the problem, which is a single application process consuming all available CPU. The memory usage is not related to the performance issue in this scenario.

Option B. Restart the affected instances on a staggered schedule.

Restarting the affected instances on a staggered schedule may alleviate the issue temporarily but it would not address the root cause of the problem. Additionally, this option may cause some users to experience downtime, which could be detrimental to the production environment.

Option C. SSH to each instance and restart the application process.

Restarting the application process on each instance can help alleviate the issue temporarily. This option is not recommended because it requires manual intervention on each instance and does not address the root cause of the problem.

Option D. Increase the maximum number of instances in the autoscaling group.

Increasing the maximum number of instances in the autoscaling group may temporarily alleviate the issue, but it will not address the root cause of the problem, which is the single application process consuming all available CPU. Also, increasing the number of instances can lead to increased costs.

Therefore, the best option in this scenario is to identify and fix the root cause of the problem, which is the single application process consuming all available CPU. This can be achieved by monitoring the CPU utilization of the application and investigating the code to optimize performance. Additionally, increasing the minimum number of instances in the autoscaling group and setting up alerts can help prevent future performance issues.