Your team is working for a medical research center which researches skin diseases.
The Center is in connection with several medical centers where images of actual cases are taken and collected.
The image files are sent to your team weekly, and your task is ingesting them into machine learning algorithms.
You are using Azure ML Designer to build ML pipelines.
Which method should you use to ingest data?
Click on the arrows to vote for the correct answer
A. B. C. D.Answer: A.
Option A is CORRECT because if your data is contained in multiple unstructured (not table-like) files, the recommended way to ingest it in the pipeline is registering the source folder as a dataset, and use the dataset like any other modules in your pipeline.
Type of the dataset must be set to “File”.
Option B is incorrect because the Iport Data module can cope with tabular data (CSV files, typically)
It is not suitable for unstructured sources, hence “File” type cannot be set.
Option C is incorrect because “Tabular” setting is not applicable for scenarios with unstructured sources.
Use “Tabular” dataset type instead.
Option D is incorrect because “Tabular” setting is not applicable for scenarios with unstructured sources.
Use “Tabular” dataset type instead.
Setting the column headers doesn't help.
Reference:
The best method for ingesting the skin disease image files into machine learning algorithms in Azure ML Designer depends on the nature of the data and the specific needs of the project. However, out of the options provided, the most suitable option is likely B: Drag the Import Data module to the canvas and set it to “File” type and link it to the source data folder.
The Import Data module in Azure ML Designer is designed to read data from a variety of sources, including files, web URLs, Azure storage, and more. In this case, the source data is being sent as image files on a weekly basis, so the "File" type is the appropriate choice.
To use the Import Data module, you would need to first drag the module onto the canvas. Then, you would need to connect it to the source data folder where the image files are being collected. This can be done by clicking the "Connect to dataset" button on the Import Data module, selecting "File dataset" as the dataset type, and selecting the appropriate folder path. This will create a reference to the image files that can be used in subsequent modules in the pipeline.
It's worth noting that the other options listed (A, C, and D) may also be suitable depending on the specific needs of the project. For example, creating a file dataset from the data folder (option A) may be useful if the data is structured in a way that makes it easier to work with as files rather than a tabular dataset. Similarly, creating a tabular dataset from the data folder (options C and D) may be useful if the data is already structured in a tabular format. Ultimately, the choice will depend on the specifics of the project and the preferences of the data science team.