You plan to build a structured streaming solution in Azure Databricks. The solution will count new events in five-minute intervals and report only events that arrive during the interval. The output will be sent to a Delta Lake table.
Which output mode should you use?
Click on the arrows to vote for the correct answer
A. B. C.C
Append Mode: Only new rows appended in the result table since the last trigger are written to external storage. This is applicable only for the queries where existing rows in the Result Table are not expected to change.
Incorrect Answers:
A: Complete Mode: The entire updated result table is written to external storage. It is up to the storage connector to decide how to handle the writing of the entire table.
B: Update Mode: Only the rows that were updated in the result table since the last trigger are written to external storage. This is different from Complete Mode in that Update Mode outputs only the rows that have changed since the last trigger. If the query doesn't contain aggregations, it is equivalent to Append mode.
https://docs.databricks.com/getting-started/spark/streaming.htmlIn Azure Databricks, Structured Streaming provides a high-level API for building streaming applications. It supports reading data from various sources and writing the processed data to different output sinks. The output mode specifies how the result of a streaming query should be written to the output sink.
In this scenario, the requirement is to count new events in five-minute intervals and report only events that arrive during the interval. The output needs to be written to a Delta Lake table.
Here, the append
mode should be used as it writes only the new data that arrives in the streaming query to the Delta Lake table. It adds new rows to the table without modifying or deleting any existing data. This mode is suitable when we are interested only in the new data and not in updating or deleting any existing data.
The complete
mode writes the entire updated result to the output sink after every trigger interval. It includes all the rows in the output, including the updated ones. This mode is suitable when we need to maintain the complete result of the query, including the updated data.
The update
mode writes only the changed data to the output sink after every trigger interval. It updates the existing rows and adds new rows for the new data. This mode is suitable when we need to maintain the current state of the query and track only the changes.
Therefore, in this scenario, we should use the append
mode to write the new events arriving every five minutes to the Delta Lake table.