Importing CSV Data Files for Machine Learning Experiments | DP-100 Exam Preparation

Importing CSV Data Files for Machine Learning Experiments

Prev Question Next Question

Question

For your machine learning experiments, you need to get CSV data files from a web location and you need to use them as a dataset in your ML workspace.

There are ten files to be imported and each of them contain different columns of a large table.

Above the column header, each file has 6 rows containing unstructured data like dates, separator lines etc.

You want to use ML Studio to complete the work.

Beside others, you should set the following options: Dataset type: Column headers: Skip rows / Skip n rows: Which combination of settings should you use?

Answers

A. 1 - File; 2 - Combine headers from all files; 3 - From all files / 6

B. 1 - Tabular; 2 - All files have the same headers; 3 - From all files /6

C. 1 - Tabular; 2 - Combine headers from all files; 3 - From all files / 6

D. 1 - File; 2 - Combine headers from all files; 3 - From the first file / 5

Show Answer

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

Answer: C.

Option A is incorrect because File datasets are designed for unstructured training data, like images etc.

For CVS sources, Tabular type should be selected.

Option B is incorrect because column headers from all files must be selected because, as it is stated, all files hold different columns of the whole data structure.

Option C is CORRECT because for structured data files, tabular dataset should be defined and, since the files contain vertical slices of a large table, column headers from all files have to be combined.

The first relevant row (the column header) is located in row 7, i.e.

Skip rows setting is 6.

Option D is incorrect because File type datasets are used for unstructured data, and column headers from all files must be used.

Diagram - ML Studio.

$Microsoft Azure Machine Learning & Create dataset from web files a Settings and preview These settings were automatically detected. Please verify that the selections were made correctly or update © Basici Basic info File format © Settings and preview Delimiter Example Comma \ | Field1,Field2,Field3 iE} % Delimited ee Encoding ‘Schema UTF-8 v Column headers Confirm details Use headers from the first file No headers Use headers from the first file Q Combine headers from all files 8 All files have same headers 3 1640031 7 115 47 52 | Back | RRS Cancel$

Reference:

The correct combination of settings to use in order to import CSV data files from a web location and use them as a dataset in a machine learning workspace in Azure ML Studio, while skipping the unstructured data above the column headers is:

C. 1 - Tabular; 2 - Combine headers from all files; 3 - From all files / 6.

Explanation:

Dataset Type: The first option asks for the type of the dataset. There are two options available: File and Tabular. Since we are working with CSV files, the correct option is Tabular.
Column Headers: The second option is about the column headers. There are two options: All files have the same headers, and Combine headers from all files. Since each file contains different columns of a large table, we need to use the Combine headers from all files option.
Skip Rows: The third option is about skipping rows from the top of the files. We need to skip the first 6 rows of each file, since they contain unstructured data like dates and separator lines. The correct option is From all files / 6.

Therefore, the correct combination of settings is: 1 - Tabular; 2 - Combine headers from all files; 3 - From all files / 6.

Option A is incorrect because it uses the wrong dataset type (File instead of Tabular). Option B is incorrect because it assumes that all files have the same headers, which is not the case. Option D is incorrect because it skips 5 rows only from the first file, while we need to skip 6 rows from all files.

Prev Question Next Question