Azure Stream Analytics Windowing Functions for Twitter Data | Exam DP-200

Which Windowing Function to Use for Counting Tweets in Azure Stream Analytics?

Question

You use Azure Stream Analytics to receive Twitter data from Azure Event Hubs and to output the data to an Azure Blob storage account.

You need to output the count of tweets during the last five minutes every five minutes. Each tweet must only be counted once.

Which windowing function should you use?

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

C

Tumbling window functions are used to segment a data stream into distinct time segments and perform a function against them, such as the example below. The key differentiators of a Tumbling window are that they repeat, do not overlap, and an event cannot belong to more than one tumbling window.

Incorrect Answers:

D: Hopping window functions hop forward in time by a fixed period. It may be easy to think of them as Tumbling windows that can overlap, so events can belong to more than one Hopping window result set. To make a Hopping window the same as a Tumbling window, specify the hop size to be the same as the window size.

https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-window-functions

To output the count of tweets during the last five minutes every five minutes, you need to use a windowing function in Azure Stream Analytics. A windowing function aggregates data over a defined period of time or event count, and the resulting aggregate is outputted at the end of each window.

There are several types of windowing functions in Azure Stream Analytics: Tumbling, Hopping, Sliding, and Session windows.

  • A Tumbling window divides a data stream into non-overlapping, fixed-size, and consecutive windows. For example, if you have a 10-minute Tumbling window and the data stream starts at 12:00 PM, the first window will be from 12:00 PM to 12:10 PM, the second window will be from 12:10 PM to 12:20 PM, and so on.

  • A Hopping window is similar to a Tumbling window, but with an overlap between consecutive windows. For example, if you have a 10-minute Hopping window with a 5-minute hop size and the data stream starts at 12:00 PM, the first window will be from 12:00 PM to 12:10 PM, the second window will be from 12:05 PM to 12:15 PM, and so on.

  • A Sliding window is a fixed-size window that slides over the data stream by a fixed amount. For example, if you have a 10-minute Sliding window with a 5-minute slide size and the data stream starts at 12:00 PM, the first window will be from 12:00 PM to 12:10 PM, the second window will be from 12:05 PM to 12:15 PM, and so on.

  • A Session window groups events that are separated by a defined gap or timeout. For example, if you have a 10-minute Session window with a 5-minute gap, the window will close when there is no data for 5 minutes, and the next window will start when new data arrives.

To output the count of tweets during the last five minutes every five minutes, you need to use a five-minute Tumbling window. A Tumbling window is the best option since it divides the data stream into fixed-size and non-overlapping windows, and you only need to output the count of tweets every five minutes.

To ensure that each tweet is only counted once, you can use a DISTINCT operator in the Stream Analytics query. The query might look something like this:

vbnet
SELECT COUNT(DISTINCT tweet_id) AS tweet_count, System.Timestamp AS output_time INTO blob_output FROM input_stream TIMESTAMP BY created_at GROUP BY TumblingWindow(minute, 5), System.Timestamp

This query counts the distinct tweet IDs in a five-minute Tumbling window and outputs the result to a Blob storage account. The output also includes the output time, which is the end time of the Tumbling window.