Importing CSV Data Files for Machine Learning Experiments | DP-100 Exam Preparation

Importing CSV Data Files for Machine Learning Experiments

Question

For your machine learning experiments, you need to get CSV data files from a web location and you need to use them as a dataset in your ML workspace.

There are ten files to be imported and each of them contain different columns of a large table.

Above the column header, each file has 6 rows containing unstructured data like dates, separator lines etc.

You want to use ML Studio to complete the work.

Beside others, you should set the following options: Dataset type: Column headers: Skip rows / Skip n rows: Which combination of settings should you use?

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

Answer: C.

Option A is incorrect because File datasets are designed for unstructured training data, like images etc.

For CVS sources, Tabular type should be selected.

Option B is incorrect because column headers from all files must be selected because, as it is stated, all files hold different columns of the whole data structure.

Option C is CORRECT because for structured data files, tabular dataset should be defined and, since the files contain vertical slices of a large table, column headers from all files have to be combined.

The first relevant row (the column header) is located in row 7, i.e.

Skip rows setting is 6.

Option D is incorrect because File type datasets are used for unstructured data, and column headers from all files must be used.

Diagram - ML Studio.

Microsoft Azure Machine Learning &
Create dataset from web files

a Settings and preview

These settings were automatically detected. Please verify that the selections were made correctly or update

© Basici
Basic info File format

© Settings and preview Delimiter Example
Comma \ | Field1,Field2,Field3

iE}
% Delimited
ee

Encoding

‘Schema
UTF-8 v

Column headers

Confirm details Use headers from the first file

No headers

Use headers from the first file

Q Combine headers from all files

8 All files have same headers

3 1640031 7 115 47 52

| Back | RRS Cancel

Reference:

The correct combination of settings to use in order to import CSV data files from a web location and use them as a dataset in a machine learning workspace in Azure ML Studio, while skipping the unstructured data above the column headers is:

C. 1 - Tabular; 2 - Combine headers from all files; 3 - From all files / 6.

Explanation:

  1. Dataset Type: The first option asks for the type of the dataset. There are two options available: File and Tabular. Since we are working with CSV files, the correct option is Tabular.

  2. Column Headers: The second option is about the column headers. There are two options: All files have the same headers, and Combine headers from all files. Since each file contains different columns of a large table, we need to use the Combine headers from all files option.

  3. Skip Rows: The third option is about skipping rows from the top of the files. We need to skip the first 6 rows of each file, since they contain unstructured data like dates and separator lines. The correct option is From all files / 6.

Therefore, the correct combination of settings is: 1 - Tabular; 2 - Combine headers from all files; 3 - From all files / 6.

Option A is incorrect because it uses the wrong dataset type (File instead of Tabular). Option B is incorrect because it assumes that all files have the same headers, which is not the case. Option D is incorrect because it skips 5 rows only from the first file, while we need to skip 6 rows from all files.