Azure Data Factory: Incremental Data Load with Watermark - Exam DP-203 Study Guide

Incremental Data Load with Watermark

Question

Melissa is implementing the incremental data load from multiple on-premises SQL server tables to Azure SQL tables using Azure Data Factory.

She is performing the incremental data copy by delta data loading from the source database using a watermark.

What is the functionality of the watermark in this scenario?

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

Correct Answer: B.

In the given scenario, Melissa is implementing the incremental data load from multiple on-premises SQL server tables to Azure SQL tables using Azure Data Factory. Incremental data load is the process of copying only the changed data from the source to the destination system. This process is more efficient as it reduces the amount of data transferred and improves performance.

A watermark is a technique used in incremental data loading to track the last modified or inserted record in the source system. It is a column that is used to identify the changes that have occurred since the last copy. In other words, a watermark is a pointer that represents the last processed row from the source system.

In this scenario, the watermark is used to track the changes in the source system, which is an on-premises SQL server. The watermark column can be a timestamp or an incrementing key, which represents the last updated record. When the data copy process runs, it queries the source system for records that have a timestamp or incrementing key greater than the watermark value. This identifies the new or changed records since the last copy. These records are then copied to the destination system, which is Azure SQL.

Option B, "It's a column that has the last updated timestamp or an incrementing key" is the correct answer. Options A, C, and D are incorrect as they do not accurately represent the functionality of a watermark in this scenario. Option A represents the SQL Server connection string, which is used to establish a connection between the source and destination systems. Option C represents the last SYS_CHANGE_VERSION, which is a metadata column used in Change Data Capture (CDC) in SQL Server. Option D represents the new SYS_CHANGE_VERSION, which is not relevant to this scenario.