Monitor Invalid Schema Error

Monitor Invalid Schema Error

Question

You configure monitoring for an Azure Synapse Analytics implementation. The implementation uses PolyBase to load data from comma-separated value (CSV) files stored in Azure Data Lake Storage Gen 2 using an external table.

Files with an invalid schema cause errors to occur.

You need to monitor for an invalid schema error.

For which error should you monitor?

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

C

Customer Scenario:

SQL Server 2016 or SQL DW connected to Azure blob storage. The CREATE EXTERNAL TABLE DDL points to a directory (and not a specific file) and the directory contains files with different schemas.

SSMS Error:

Select query on the external table gives the following error:

Msg 7320, Level 16, State 110, Line 14

Cannot execute the query "Remote Query" against OLE DB provider "SQLNCLI11" for linked server "(null)". Query aborted-- the maximum reject threshold (0 rows) was reached while reading from an external source: 1 rows rejected out of total 1 rows processed.

Possible Reason:

The reason this error happens is because each file has different schema. The PolyBase external table DDL when pointed to a directory recursively reads all the files in that directory. When a column or data type mismatch happens, this error could be seen in SSMS.

Possible Solution:

If the data for each table consists of one file, then use the filename in the LOCATION section prepended by the directory of the external files. If there are multiple files per table, put each set of files into different directories in Azure Blob Storage and then you can point LOCATION to the directory instead of a particular file.

The latter suggestion is the best practices recommended by SQLCAT even if you have one file per table.

Incorrect Answers:

A: Possible Reason: Kerberos is not enabled in Hadoop Cluster.

https://techcommunity.microsoft.com/t5/DataCAT/PolyBase-Setup-Errors-and-Possible-Solutions/ba-p/305297

The correct answer is C. Cannot execute the query "Remote Query" against OLE DB provider "SQLNCLI11" for linked server "(null)", Query aborted- the maximum reject threshold (0 rows) was reached while reading from an external source: 1 rows rejected out of total 1 rows processed.

Explanation: PolyBase is a feature in Azure Synapse Analytics that allows users to access and query data in external data sources such as Azure Data Lake Storage Gen 2, Hadoop, and SQL Server. PolyBase uses external tables to define the structure of the data in the external data source, which allows the data to be queried using SQL syntax.

When loading data from external data sources using PolyBase, it is possible to encounter errors due to data schema mismatches. This can occur when the structure of the data in the external data source does not match the structure defined in the external table. To monitor for this type of error, you should monitor for the following error message:

"Cannot execute the query "Remote Query" against OLE DB provider "SQLNCLI11" for linked server "(null)", Query aborted- the maximum reject threshold (0 rows) was reached while reading from an external source: 1 rows rejected out of total 1 rows processed."

This error message indicates that a row was rejected during the data loading process due to a schema mismatch. The reject threshold of 0 rows was reached, which caused the query to abort.

The other answer options are incorrect because they reference errors that are not related to schema mismatches. Option A references a Java exception related to Kerberos authentication. Option B references a Java exception related to the Hadoop file system. Option D references a Java exception related to PolyBase login credentials.