Azure Databricks Cluster - Accessing from Azure ML Workspace

Using Script to Access Azure Databricks Cluster

Question

You are developing your machine learning model which needs to be run on an Azure Databricks cluster.

You want to use the following script in order to access your DB cluster from your Azure ML workspace:

from azureml.core import Workspace from azureml.core.compute import ComputeTarget, DatabricksCompute myws = <.......1......>() compute_name = 'db_cluster' db_workspace_name = 'db_workspace' db_resource_group = 'db_resource_group' db_access_token = '3747-bxz-xjkh-2293-40...' db_config = DatabricksCompute.attach_configuration(resource_group=db_resource_group,  workspace_name=db_workspace_name,  access_token=db_access_token) databricks_compute = <......2.....>(myws, compute_name, db_config) databricks_compute.wait_for_completion(True) 
Which of the following instructions are missing from the above script?

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

Correct Answer: C.

Option A is incorrect because in order to use external ML computes like a Databricks cluster, the “attach” method must be used instead of “create”.

Option B is incorrect because the “from_config” method of the Workspace must be used first, then the “attach” method of ComputeTarget is needed.

Option C is CORRECT because if you want to use your own existing Azure Databricks cluster, you start reading your ML workspace from config, then you need to attach the Databricks workspace to it.

Option D is incorrect because the two instructions are in the wrong order.

Reference:

The script provided in the question is incomplete and missing some instructions. In order to use the Azure Databricks cluster in the Azure Machine Learning workspace, the following steps need to be performed:

  1. Create an instance of the Workspace class to connect to the Azure Machine Learning workspace. This can be done using the Workspace.from_config() method, which reads the workspace configuration file from a specified location and returns a Workspace object.

  2. Create an instance of the ComputeTarget class to represent the Databricks cluster. This can be done using the ComputeTarget.create() method, which takes the workspace object and the Databricks cluster configuration as input parameters, and returns a ComputeTarget object.

  3. Attach the Databricks cluster to the workspace as a compute target. This can be done using the ComputeTarget.attach() method, which takes the workspace object and the name of the compute target as input parameters, and returns a ComputeTarget object.

  4. Create an instance of the DatabricksCompute class to represent the Databricks cluster. This can be done using the DatabricksCompute() method, which takes the workspace object, the name of the compute target, and the Databricks cluster configuration as input parameters, and returns a DatabricksCompute object.

  5. Wait for the Databricks cluster to be provisioned. This can be done using the wait_for_completion() method of the DatabricksCompute object, which takes a boolean value indicating whether to wait for the provisioning to complete or not.

Based on the above steps, the correct answer is option C. The missing instructions from the script are Workspace.from_config() and ComputeTarget.attach().

Here's the corrected script with the missing instructions:

python
from azureml.core import Workspace from azureml.core.compute import ComputeTarget, DatabricksCompute # Step 1: Create workspace object myws = Workspace.from_config() # Step 2: Create compute target object compute_name = 'db_cluster' db_resource_group = 'db_resource_group' db_workspace_name = 'db_workspace' db_config = DatabricksCompute.attach_configuration(resource_group=db_resource_group, workspace_name=db_workspace_name, access_token=db_access_token) db_compute_target = ComputeTarget.attach(myws, compute_name, db_config) # Step 3: Create DatabricksCompute object databricks_compute = DatabricksCompute(workspace=myws, name=compute_name, compute_target=db_compute_target) # Step 4: Wait for completion databricks_compute.wait_for_completion(True)

Note that the db_access_token variable is not defined in the script provided in the question, so it will need to be defined before the script can be run.