High-Traffic Web Application SLI

Service Level Objective for User Experience

Question

You support a high-traffic web application with a microservice architecture.

The home page of the application displays multiple widgets containing content such as the current weather, stock prices, and news headlines.

The main serving thread makes a call to a dedicated microservice for each widget and then lays out the homepage for the user.

The microservices occasionally fail; when that happens, the serving thread serves the homepage with some missing content.

Users of the application are unhappy if this degraded mode occurs too frequently, but they would rather have some content served instead of no content at all.

You want to set a Service Level Objective (SLO) to ensure that the user experience does not degrade too much.

What Service Level Indicator (SLI) should you use to measure this?

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

D.

https://cloud.google.com/stackdriver/docs/solutions/slo-monitoring

The correct answer is A: A quality SLI: the ratio of non-degraded responses to total responses.

In a microservices architecture, a single request from a user may require several microservices to be called in order to complete the request. If one or more of those microservices fail to respond or produce invalid data, it can result in a degraded user experience.

To set a Service Level Objective (SLO) that addresses this issue, we need to select an appropriate Service Level Indicator (SLI) that measures the desired outcome.

Option A suggests measuring a quality SLI, which is the ratio of non-degraded responses to total responses. This means that we measure the percentage of requests that are served with all of the requested content, versus those that are served with missing or incorrect content.

This SLI is appropriate for this scenario because it directly measures the user experience. It ensures that the user experience does not degrade too much and that a significant number of requests are not served with missing content. This is important because users would rather have some content served instead of no content at all, as stated in the question.

Option B suggests measuring an availability SLI, which is the ratio of healthy microservices to the total number of microservices. This SLI measures the availability of the microservices, but it does not measure the user experience directly. Even if all microservices are available, the user experience can still be degraded if some of the microservices are not responding with valid data.

Option C suggests measuring a freshness SLI, which is the proportion of widgets that have been updated within the last 10 minutes. This SLI measures the freshness of the data, but it does not measure the completeness of the response or the user experience. Even if all widgets are fresh, the user experience can still be degraded if some of the widgets are missing.

Option D suggests measuring a latency SLI, which is the ratio of microservice calls that complete in under 100 ms to the total number of microservice calls. This SLI measures the performance of the microservices, but it does not measure the completeness of the response or the user experience. Even if all microservices respond quickly, the user experience can still be degraded if some of the microservices are not responding with valid data.

Therefore, option A is the correct answer as it measures the completeness of the response and the user experience directly.