Best Practices for Configuring Scikit-Learn Run in Azure

Configuring Scikit-Learn Run in Azure

Question

For your machine learning experiments, you are going to use the Scikit-Learn framework.

You want to keep your Python code defining the run configuration as simple and compact as possible.

Which is the best way to achieve this goal?

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

Answer: B.

Option A is incorrect because while this solution can be used to set the run configuration, in the case of Scikit-Learn framework, using the pre-configured SKLearn estimator is the best solution.

Option B is CORRECT because the simplest way to define the run configuration for the learning script built on a given ML framework (like Scikit-Learn) is to use the framework-specific estimators.

Option C is incorrect because while this can be used to set the run configuration, in the case of Scikit-Learn framework, using the pre-configured SKLearn estimator is the best solution.

Option D is incorrect because the specific ML packages (like ScikitLearn, PyTorch etc.) are not contained in the base configuration.

If you need Scikit-Learn, you have to add it to your run configuration (either via ScriptRunConfig or via an estimator).

Reference:

In order to keep the Python code defining the run configuration for machine learning experiments using Scikit-Learn as simple and compact as possible, we have the following options:

A. Use CondaDependencies.create(conda_packages=[scikit-learn]...) to define the environment and use it as the environment_definition parameter of an Estimator:

This option creates a Conda environment with Scikit-Learn installed and uses it as the environment for running the experiment. This can be achieved using the CondaDependencies class from the azureml.core.conda_dependencies module. We can use this class to create a new environment with specific packages installed, such as Scikit-Learn, and then pass it to the Estimator class as the environment_definition parameter. This ensures that the environment used to run the experiment has Scikit-Learn installed, and we don't need to worry about installing it separately.

B. Import the SKLearn package and use the SKLearn pre-configured estimator to define the run configuration:

This option involves using the pre-configured estimator for Scikit-Learn provided by the azureml.train.sklearn module. This estimator automatically sets up the environment with Scikit-Learn installed and takes care of other necessary configurations. We can simply import this estimator and use it to define the run configuration.

C. Import the Estimator package and use Estimator with parameter conda_packages=[scikit-learn]:

This option is similar to option A, but instead of using the CondaDependencies class to create the environment, we can simply pass the conda_packages parameter to the Estimator class with a list of packages to be installed in the environment. In this case, we would pass ['scikit-learn'] as the value for the conda_packages parameter.

D. You don't need to set anything special because the Azure ML environments are pre-configured for the Scikit-Learn framework.

This option suggests that we don't need to set up anything special as the Azure ML environments are already pre-configured with Scikit-Learn. However, this option is not correct as the Azure ML environments do not come pre-configured with every package and library, and we need to specify the necessary dependencies.

In summary, the best option to achieve the goal of keeping the Python code defining the run configuration for Scikit-Learn experiments as simple and compact as possible is option B, which involves using the pre-configured estimator provided by the azureml.train.sklearn module. However, options A and C are also valid alternatives that offer more control over the environment configuration. Option D is incorrect as it assumes the environments are already pre-configured with Scikit-Learn, which is not the case.