Note: This question is a part of series of questions that present the same scenario. Each question in the series contains a unique solution. Determine whether the solution meets the stated goals.
You develop a data ingestion process that will import data to an enterprise data warehouse in Azure Synapse Analytics. The data to be ingested resides in parquet files stored in an Azure Data Lake Gen 2 storage account.
You need to load the data from the Azure Data Lake Gen 2 storage account into the Data Warehouse.
Solution:
1. Use Azure Data Factory to convert the parquet files to CSV files
2. Create an external data source pointing to the Azure Data Lake Gen 2 storage account
3. Create an external file format and external table using the external data source
4. Load the data using the CREATE TABLE AS SELECT statement
Does the solution meet the goal?
Click on the arrows to vote for the correct answer
A. B.A
It is not necessary to convert the parquet files to CSV files.
You need to create an external file format and external table using the external data source.
You load the data using the CREATE TABLE AS SELECT statement.
https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-load-from-azure-data-lake-storeThe provided solution suggests using Azure Data Factory to convert the parquet files to CSV format, creating an external data source, an external file format, and an external table using the external data source, and then loading the data using the CREATE TABLE AS SELECT statement.
This solution does not meet the stated goal because it involves an unnecessary step of converting the data from Parquet to CSV format. Instead of converting the data, it is recommended to use PolyBase in Azure Synapse Analytics to directly load Parquet files into the data warehouse.
The correct solution to meet the stated goal would be:
By following the above steps, data can be directly loaded from the Parquet files in the Azure Data Lake Gen 2 storage account into the Azure Synapse Analytics data warehouse, without the need to convert the data to CSV format. Therefore, the correct answer is B, the provided solution does not meet the stated goal.