Apply SQL Transformation - Best Practices for ML Pipeline Pre-processing | Microsoft DP-100 Exam

Apply SQL Transformation Best Practices

Question

During the pre-processing phase in an ML pipeline, you have to make transformations on your data.

You, as an experienced SQL programmer, decide to write SQL scripts to solve the problem.

You want to use the Apply SQL Transformation module in the ML Designer.

Which practice should you follow? Select two:

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

Answers: A and D.

Option A is CORRECT because the SQL engine used by this module is SQLite.

One of its limitations is that while LEFT OUTER JOIN is implemented, RIGHT OUTER JOIN is not available!

Option B is incorrect becausethe SQL engine used by this module is SQLite.

LEFT OUTER JOIN is implemented in the SQLite engine, so you can use them.

Option C is incorrect because the SQL engine used by this module is SQLite.

One of its limitations is that views are read-only, i.e.

cannot be used to write the underlying data.

Option D is CORRECT because while you can create views in SQLite, views are read-only, i.e.

INSERT, UPDATE and DELETE operations cannot be directly used with them.

Reference:

The Apply SQL Transformation module in Azure Machine Learning Designer allows users to apply SQL scripts to their data as part of an ML pipeline's pre-processing phase. When using this module, it's important to follow certain best practices to ensure that the transformations are executed correctly and efficiently.

Two best practices to follow when using the Apply SQL Transformation module are:

A. Avoid using RIGHT OUTER JOIN: A RIGHT OUTER JOIN returns all the rows from the right-hand table (the "outer" table) and matching rows from the left-hand table (the "inner" table). However, the Apply SQL Transformation module requires the output schema to be predefined before execution. Since the right-hand table can contain null values, this can cause issues with the output schema, which can in turn affect downstream modules. To avoid these issues, it's best to use INNER JOIN or LEFT OUTER JOIN instead.

B. Avoid using LEFT OUTER JOIN: While LEFT OUTER JOIN is often used in SQL, it can also cause issues in the Apply SQL Transformation module. Like RIGHT OUTER JOIN, it can produce null values in the output schema that can cause downstream issues. Instead, it's best to use INNER JOIN or RIGHT OUTER JOIN.

C. Always use VIEW if you want to execute INSERT/UPDATE statements: When using INSERT or UPDATE statements in SQL scripts, it's recommended to use a VIEW instead of the original dataset. This can help prevent accidental data modification or deletion. Using a VIEW ensures that only the view's metadata is updated, and not the underlying data.

D. Never use VIEW if you want to execute INSERT/UPDATE statements: This statement is incorrect. It's actually recommended to use a VIEW when executing INSERT or UPDATE statements. This helps ensure that only the view's metadata is updated and not the underlying data.

In summary, when using the Apply SQL Transformation module in Azure Machine Learning Designer, it's important to avoid using RIGHT OUTER JOIN and LEFT OUTER JOIN, and to use a VIEW when executing INSERT or UPDATE statements in SQL scripts.