You are about building a machine learning environment in order to train models for processing a large number of jpg files.
The files are stored in Azure Blob Storage.
You need to find the best, most effective way to access from and use the files in your ML workspace.
What is the recommended way of linking and accessing data from your workspace?
Click on the arrows to vote for the correct answer
A. B. C. D.Answer: D.
Option A is incorrect because tabular datasets are used for structure data which can directly be loaded to dataframe structures.
For jpg files, the file dataset type should be used.
In addition, downloading the files is unnecessary and resource-intensive, use mounting instead.
Option B is incorrect because tabular datasets are used for structure data which can directly be loaded to dataframe structures.
Accessing the files via mounting is correct.
Option C is incorrect because downloading the files is unnecessary and resource-intensive, the recommended method of accessing large amounts of remote data is by mounting their storage to the compute.
Using datasets of type “file” is correct.
Option D is CORRECT because a file dataset should be used for unstructured data (like images stored as files) and it is recommended to access the files by mounting them to the compute, in order to avoid unnecessary data movement from the storage.
Data needed during the training will be transferred automatically, on demand.
Diagram:
Link storage account to ML workspace.
Reference:
The recommended way of linking and accessing data from an ML workspace for processing a large number of jpg files stored in Azure Blob Storage is to register the data as a file dataset and access it by mounting.
When you register data as a file dataset, you are essentially creating a reference to the location of the data in Azure Blob Storage. This reference can be used to access the data from your workspace without having to download it to your local machine or copy it to a different location.
Mounting the data as a file dataset means that the data is made available to the workspace as a mounted file system. This allows the data to be accessed just like any other file system, enabling data exploration, processing, and training of machine learning models.
Additionally, mounting the data as a file dataset allows for faster access to the data since it eliminates the need to download or copy the data to a different location. It also provides a more efficient way to manage large volumes of data since the data can be accessed in-place without requiring additional storage.
Therefore, the recommended way of linking and accessing data from an ML workspace for processing a large number of jpg files stored in Azure Blob Storage is to register the data as a file dataset and access it by mounting.