Azure Synapse Analytics: Data Loading from Azure Data Lake Storage (ADLS) to Dedicated SQL Pools

Data Loading from Azure Data Lake Storage (ADLS) to Dedicated SQL Pools in Azure Synapse Analytics

Prev Question Next Question

Question

You are trying to load data from Azure Data Lake Storage (ADLS) to dedicated SQL pools in Azure Synapse Analytics. The following SQL Statement was run to create a target table in a dedicated SQL Pool.

CREATE TABLE [dbo].[DimProduct] ( [ProductKey] [int] NOT NULL, [ProductLabel] [nvarchar](255) NULL, [ProductName] [nvarchar](500) NULL ) WITH ( DISTRIBUTION = HASH([ProductKey]), CLUSTERED COLUMNSTORE INDEX ); After that, a copy statement was run while connected to SQL dedicated pool and completed copying from ADLS. But you find some data rows are not compressed as it was in the beginning.

What will be the solution for this?

Answers

A. create single-columns statistics

B. Run “ALTER INDEX ALL ON [dbo].[DimProduct] REBUILD;”

C. Drop the target table and recreate

D. None of the above.

Show Answer

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

Correct Answer: B.

While loading data from Azure Data Lake Storage into dedicated SQL pools in Azure Synapse Analytics, there are two steps involved.

One is creating the target table, and the next one is copying the data from ADLS to SQL pool.

The CREATE TABLE command shows that the data is using columnstore indexing.

So, it is possible for some data to be not compressed into a columnstore.

To revolve this, we can simply rebuild the table to fore the columnstore index.

This will increase the query performance as well.

Option A is incorrect: It is desired to optimize statistics by creating single-column statistics after loading the data into target table.

But that will not solve the problem with columnstore compression in the target table.

Option B is correct: This command will rebuild the table and enforce the columnstore indexing.

Thus the issue with compression will be removed.

Option C is incorrect: There is no need to drop the table and recreate, since the rebuild command will be the better choice.

The correct answer is B. Run “ALTER INDEX ALL ON [dbo].[DimProduct] REBUILD;”.

Explanation: When a table is created in Azure Synapse Analytics dedicated SQL pools with a clustered columnstore index, the data is stored in a compressed format. This compression helps reduce the storage space required for the data and also improves query performance. However, in some cases, the compression may not be applied correctly during the data loading process.

In this scenario, some data rows are not compressed as they were in the beginning. To fix this issue, we need to rebuild the clustered columnstore index using the ALTER INDEX statement. This will ensure that all the data in the table is compressed correctly.

Option A, creating single-column statistics, is not relevant to this issue as it is used to improve query performance by providing the query optimizer with information about the distribution of data values in a column. It does not affect the compression of data in the table.

Option C, dropping the target table and recreating it, is not necessary as we can simply rebuild the clustered columnstore index to correct the compression issue.

Therefore, the correct solution for this issue is to run the following statement:

ALTER INDEX ALL ON [dbo].[DimProduct] REBUILD;

This statement will rebuild the clustered columnstore index on the DimProduct table, ensuring that all the data is compressed correctly.

Prev Question Next Question