Minimizing Service Degradation: Efficient On-Call Engineer Notification | PCD Exam

Notifying On-Call Engineers about Service Degradation in Production

Question

You want to notify on-call engineers about a service degradation in production while minimizing development time.

What should you do?

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

A.

When a service degradation occurs in production, it is crucial to notify on-call engineers as quickly as possible to minimize any negative impact on the user experience. There are different ways to achieve this notification process, but some of them are more efficient than others.

Option A suggests using Cloud Function to monitor resources and raise alerts. Cloud Functions are serverless code that can be triggered by events, such as changes in a storage bucket or a new message in a Pub/Sub topic. This option could be a good choice if you have custom monitoring requirements or want to create a tailored alerting system. For instance, you could write a Cloud Function that periodically checks the status of your services and raises an alert if it detects any degradation. However, implementing a custom alerting system with Cloud Functions may require more development time than other options.

Option B recommends using Cloud Pub/Sub to monitor resources and raise alerts. Cloud Pub/Sub is a messaging service that decouples senders and receivers of messages. This option could be a good choice if you want a scalable and flexible messaging system that can handle different types of notifications. For example, you could create a Pub/Sub topic for service degradation events and subscribe your on-call engineers to that topic. Whenever a degradation event occurs, a message is published to the topic, and all the subscribers receive it. Using Cloud Pub/Sub could reduce development time since it provides a ready-to-use messaging infrastructure.

Option C suggests using Stackdriver Error Reporting to capture errors and raise alerts. Stackdriver Error Reporting is a feature of Google Cloud that automatically collects and analyzes application errors. This option could be a good choice if you want to receive alerts for specific errors that are related to service degradation. For example, if your service is experiencing a high rate of 500 server errors, Stackdriver Error Reporting can detect that and send an alert to your on-call engineers. Using Stackdriver Error Reporting could reduce development time since it provides an automated error detection and alerting system.

Option D recommends using Stackdriver Monitoring to monitor resources and raise alerts. Stackdriver Monitoring is a feature of Google Cloud that provides visibility into the performance, uptime, and overall health of your applications and infrastructure. This option could be a good choice if you want a comprehensive monitoring system that covers different aspects of your services, such as CPU usage, network traffic, or latency. For instance, you could create a Stackdriver Monitoring alert policy that triggers an alert when the latency of your service exceeds a certain threshold. Using Stackdriver Monitoring could reduce development time since it provides a ready-to-use monitoring infrastructure.

In summary, all the options presented have advantages and disadvantages, and the best choice depends on your specific requirements and constraints. However, using Cloud Pub/Sub or Stackdriver Error Reporting could be the most efficient options in terms of development time, since they provide ready-to-use messaging and alerting systems that can be easily configured and customized.