Azure Stream Analytics: Metric-based Alert System for High Volume Inputs

Azure Stream Analytics

Question

There is a group of sensors that send data to Azure Event Hub in real-time.

This data is then sent to the Azure Stream Analytics job for processing.

You are setting up a metric-based alert system whenever there are more volume inputs, and stream analytics resources cannot handle it.

Which of the following metrics should be used to set up this alert?

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

Correct Answer: A.

Watermark delay metric is used as a source to find out the job health in Stream Analytics.

It is calculated by subtracting the largest watermark that happened till then from the wall clock time of the processing node.

A small illustration of this is given in the picture below.

Simple case: no time window, late arrival and out-of-order policy set to 10 seconds
SELECT *
FROM input TIMESTAMP BY eventTime

Events Stream Ingestion Stream processing

= FS = Output

Upload,

Transmission Event Hubs loT Hub
- {—_— — + Timeline (“wall clock”)
0100 Arrival Time (EnqueuedTime}: Time when processed event is outputted:

12:01:00 12:01:05 12:01:06
| |

|
Output Watermark Delay = 12:01:06 — 12:01:00 = 6 seconds

Image source: Microsoft Documentation

When there is a resource scarcity in the Stream Analytics job, the watermark delay value also increases.

So, if you set up an alert based on the threshold in this metric, the notifications will be sent.

Option A is correct: Based on the watermark delay metric, the job health can be monitored.

In the case of resource scarcity, the watermark delay metric increases.

Option B is incorrect: This metric shows the number of events that arrived out-of-order and then dropped/ adjusted the timestamp.

Option C is incorrect: This metric shows the events that arrived earlier and then dropped or timestamp got adjusted.

Option D is incorrect: This metric shows the value of events that were not converted to the desired schema after processing.

To set up an alert system when the volume of inputs exceeds the processing capability of the Azure Stream Analytics job, we need to monitor a metric that indicates the processing rate of the job. One such metric is Watermark Delay.

Watermark delay is the time difference between the timestamp of the most recent event in the input stream and the current system time. It represents the lag between the time an event was generated and the time it was processed by Azure Stream Analytics.

If the watermark delay is increasing continuously, it means that the Stream Analytics job is not able to process the input data at the expected rate. This could be because of an increase in the volume of data or because of a slow processing rate. In either case, it indicates that the system is under stress, and an alert should be triggered to notify the administrators.

Out-of-Order Events, Early Input Events, and Data Conversion Errors are not suitable metrics for monitoring the processing rate of the Stream Analytics job. Out-of-Order Events and Early Input Events indicate data quality issues, whereas Data Conversion Errors indicate issues with data type conversions. These metrics may be useful for debugging the system but are not relevant to the monitoring of the processing rate.

Therefore, the correct answer is A. Watermark Delay.