Azure Data Ingestion to Synapse Analytics | Solution Evaluation | Exam DP-200

Data Ingestion Process for Azure Synapse Analytics

Question

Note: This question is a part of series of questions that present the same scenario. Each question in the series contains a unique solution. Determine whether the solution meets the stated goals.

You develop a data ingestion process that will import data to an enterprise data warehouse in Azure Synapse Analytics. The data to be ingested resides in parquet files stored in an Azure Data Lake Gen 2 storage account.

You need to load the data from the Azure Data Lake Gen 2 storage account into the Data Warehouse.

Solution:

1. Use Azure Data Factory to convert the parquet files to CSV files

2. Create an external data source pointing to the Azure storage account

3. Create an external file format and external table using the external data source

4. Load the data using the INSERT'SELECT statement

Does the solution meet the goal?

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B.

B

There is no need to convert the parquet files to CSV files.

You load the data using the CREATE TABLE AS SELECT statement.

https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-load-from-azure-data-lake-store

The proposed solution for loading data from an Azure Data Lake Gen 2 storage account into an Azure Synapse Analytics data warehouse involves the following steps:

  1. Use Azure Data Factory to convert the Parquet files to CSV files.
  2. Create an external data source pointing to the Azure storage account.
  3. Create an external file format and external table using the external data source.
  4. Load the data using the INSERT'SELECT statement.

Let's evaluate each step to determine if the proposed solution meets the stated goal of loading data into the enterprise data warehouse in Azure Synapse Analytics:

Step 1: Use Azure Data Factory to convert the Parquet files to CSV files. This step involves converting the Parquet files stored in the Azure Data Lake Gen 2 storage account to CSV format using Azure Data Factory. While this step may be necessary if the data warehouse requires CSV files as input, it is not a requirement for loading data from the Azure Data Lake Gen 2 storage account into Azure Synapse Analytics. It is possible to load Parquet files directly into the data warehouse using PolyBase. Therefore, this step is not required to meet the stated goal.

Step 2: Create an external data source pointing to the Azure storage account. This step involves creating an external data source that points to the Azure Data Lake Gen 2 storage account where the Parquet files are stored. This step is required to enable the data warehouse to access the data stored in the Azure storage account. Therefore, this step meets the stated goal.

Step 3: Create an external file format and external table using the external data source. This step involves creating an external file format and an external table using the external data source created in step 2. The external file format defines the structure of the data files in the storage account, while the external table defines the schema of the data that will be loaded into the data warehouse. This step is required to enable the data warehouse to understand the structure of the data files and load them into the data warehouse. Therefore, this step meets the stated goal.

Step 4: Load the data using the INSERTSELECT statement. This step involves using the INSERTSELECT statement to load data from the external table into a table in the data warehouse. This step is required to load the data into the data warehouse. Therefore, this step meets the stated goal.

Based on the above evaluation, we can see that the proposed solution meets the stated goal of loading data from an Azure Data Lake Gen 2 storage account into an Azure Synapse Analytics data warehouse. Therefore, the answer is A. Yes.