You are responsible for the reliability of a high-volume enterprise application.

A large number of users report that an important subset of the application's functionality '" a data intensive reporting feature '" is consistently failing with an HTTP 500 error.

When you investigate your application's dashboards, you notice a strong correlation between the failures and a metric that represents the size of an internal queue used for generating reports.

You trace the failures to a reporting backend that is experiencing high I/O wait times.

You quickly fix the issue by resizing the backend's persistent disk (PD)

How you need to create an availability Service Level Indicator (SLI) for the report generation feature.

How would you define it?

Question

You are responsible for the reliability of a high-volume enterprise application.

A large number of users report that an important subset of the application's functionality '" a data intensive reporting feature '" is consistently failing with an HTTP 500 error.

When you investigate your application's dashboards, you notice a strong correlation between the failures and a metric that represents the size of an internal queue used for generating reports.

You trace the failures to a reporting backend that is experiencing high I/O wait times.

You quickly fix the issue by resizing the backend's persistent disk (PD)

How you need to create an availability Service Level Indicator (SLI) for the report generation feature.

How would you define it?

Exam-Answer · Accepted Answer

As the application`s report generation queue size compared to a known-good threshold

Resizing Persistent Disk for Improved Performance

Question

Answers

Explanations