Azure Data Ingestion Process: Loading Parquet Files into Synapse Analytics | Exam DP-200 Solution

Data Ingestion Process: Loading Parquet Files into Azure Synapse Analytics

Question

Note: This question is a part of series of questions that present the same scenario. Each question in the series contains a unique solution. Determine whether the solution meets the stated goals.

You develop a data ingestion process that will import data to an enterprise data warehouse in Azure Synapse Analytics. The data to be ingested resides in parquet files stored in an Azure Data Lake Gen 2 storage account.

You need to load the data from the Azure Data Lake Gen 2 storage account into the Data Warehouse.

Solution:

1. Use Azure Data Factory to convert the parquet files to CSV files

2. Create an external data source pointing to the Azure Data Lake Gen 2 storage account

3. Create an external file format and external table using the external data source

4. Load the data using the CREATE TABLE AS SELECT statement

Does the solution meet the goal?

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B.

A

It is not necessary to convert the parquet files to CSV files.

You need to create an external file format and external table using the external data source.

You load the data using the CREATE TABLE AS SELECT statement.

https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-load-from-azure-data-lake-store

The provided solution suggests using Azure Data Factory to convert the parquet files to CSV format, creating an external data source, an external file format, and an external table using the external data source, and then loading the data using the CREATE TABLE AS SELECT statement.

This solution does not meet the stated goal because it involves an unnecessary step of converting the data from Parquet to CSV format. Instead of converting the data, it is recommended to use PolyBase in Azure Synapse Analytics to directly load Parquet files into the data warehouse.

The correct solution to meet the stated goal would be:

  1. Create a master key in Azure Synapse Analytics.
  2. Create a database scoped credential in the data warehouse.
  3. Create an external data source in the data warehouse that references the Azure Data Lake Gen 2 storage account and use the database scoped credential to authenticate.
  4. Create an external file format that specifies the data format and layout of the Parquet files.
  5. Create an external table in the data warehouse that references the external data source and the external file format.
  6. Load the data from the external table using the INSERT INTO statement.

By following the above steps, data can be directly loaded from the Parquet files in the Azure Data Lake Gen 2 storage account into the Azure Synapse Analytics data warehouse, without the need to convert the data to CSV format. Therefore, the correct answer is B, the provided solution does not meet the stated goal.