Enable Schema Drift in Azure Data Factory: Source Settings and Validation

Enable Schema Drift in Source Settings: Azure Data Factory

Question

Hugh is a Data Analyst of Woodgrove Inc.

He's working on data orchestration using Azure data factory for copying data from the Azure data lake storage Gen2 to Databricks & transformation.

In the pipeline, he's using ADF to build complex solutions with data flow's schema drift feature & applying reusable patterns based on flexible dataset schemas.

He needs to apply schema drift in the source settings of Azure Data factory to define the source data flow as drifted.

The schema drift is defined as reading columns that aren't defined in the dataset schema.

He checked the option “validate schema” in the options field of the source settings in Azure data factory.

Does the solution meet the requirements of enabling “schema drift” in the Data Factory pipeline?

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B.

Correct Answer: B.

Based on the given scenario, the answer to whether the solution meets the requirements of enabling "schema drift" in the Data Factory pipeline is B. No.

Here's why:

Schema drift is a feature in Azure Data Factory that allows the data flow to read columns that are not defined in the dataset schema. This feature is useful when dealing with changing data structures or sources that are not under your control. To enable schema drift in the data flow, you need to specify it in the source settings.

In the given scenario, Hugh is working on data orchestration using Azure Data Factory for copying data from Azure Data Lake Storage Gen2 to Databricks and transformation. He is using ADF to build complex solutions with data flow's schema drift feature and applying reusable patterns based on flexible dataset schemas.

However, it is stated that Hugh checked the option "validate schema" in the options field of the source settings in Azure Data Factory. This option validates the schema of the source data against the defined schema in the dataset, and if there are any discrepancies, it throws an error. This means that if a new column is added to the source data, the data flow will not be able to read it as it is not defined in the schema. This contradicts the requirement of enabling schema drift in the pipeline.

Therefore, the solution does not meet the requirements of enabling schema drift in the Data Factory pipeline. To enable schema drift, Hugh needs to uncheck the "validate schema" option in the source settings of the pipeline. This will allow the data flow to read columns that are not defined in the dataset schema, thereby enabling schema drift in the pipeline.