Optimizing Reliability and Deployment Velocity for Cloud Services

Achieving a Balance between Reliability and Deployment Velocity

Question

You support a service with a well-defined Service Level Objective (SLO)

Over the previous 6 months, your service has consistently met its SLO and customer satisfaction has been consistently high.

Most of your service's operations tasks are automated and few repetitive tasks occur frequently.

You want to optimize the balance between reliability and deployment velocity while following site reliability engineering best practices.

What should you do? (Choose two.)

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D. E.

DE.

https://sre.google/sre-book/service-level-objectives/

Based on the given scenario, the service has consistently met its Service Level Objective (SLO) and customer satisfaction has been high. The goal is to optimize the balance between reliability and deployment velocity while following Site Reliability Engineering (SRE) best practices. The following are the two best options to achieve this:

A. Make the service's SLO more strict: Making the SLO more strict could increase the reliability of the service but might also reduce the deployment velocity. It's essential to evaluate the impact of this change on the service's performance and whether it's acceptable to the customers. The team must ensure that the service continues to meet the new, more strict SLO.

E. Change the implementation of your Service Level Indicators (SLIs) to increase coverage: Service Level Indicators (SLIs) are key metrics that measure the performance of a service. Increasing the coverage of SLIs provides more visibility into the service's performance, making it easier to detect issues and troubleshoot them. This, in turn, can increase the reliability of the service without sacrificing deployment velocity.

B, C, and D are not the best options in this scenario.

B. Increasing the service's deployment velocity and/or risk: Increasing deployment velocity and/or risk could potentially impact the reliability of the service. It's important to maintain the current level of reliability and assess the impact of any changes made to the deployment process.

C. Shifting engineering time to other services that need more reliability: If the service is already meeting its SLO, shifting engineering time to other services that need more reliability might not be the best use of resources. The team must ensure that the service continues to meet its SLO, and if engineering time is shifted to other services, it should not impact the reliability of this service.

D. Getting the product team to prioritize reliability work over new features: While prioritizing reliability work over new features is a good practice, it might not be necessary if the service is already meeting its SLO. It's important to balance new features and reliability work while ensuring that the service continues to meet its SLO.