Azure Data Science Solution: Preventing Unauthorized Access to Source Data | Exam DP-100

Preventing Unauthorized Access to Source Data

Question

You are designing your ML work environment.

Your data resides in an Azure storage account, in a blob storage container.

You want to prevent unauthorized access to your source data and don't want to risk exposing access credentials.

Which options should you use to fulfil the above requirements?

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

Answer: B.

Option A is incorrect becauseembedding any sensitive information (IDs, keys, tokens etc.) in the code must be avoided by all means.

Option B is CORRECT because in the Azure ML environment, datastores are designed to store connection information like subscription IDs, access keys etc.

By using datastores, all these information will be stored securely, and used via referencing the datastore which keeps the sensitive data hidden from scripts, applications etc.

Option C is incorrect because Datasets are references to your data.

They use the connection information stored in the datastore to access data from the location it actually resides.

They are used in connection with datastores.

Option D is incorrect because Estimator is a coding construct, it is an object that combines a run configuration and a script configuration in a single object for simpler use.

They have nothing to do with accessing data, connection information etc.

Reference:

To fulfill the requirement of preventing unauthorized access to source data and avoiding exposure of access credentials, we need to register the Azure storage account and blob storage container as a datastore or dataset in Azure Machine Learning.

Answer B: Register the blob storage as a Datastore When we register a datastore, we can use the connection string or shared access signature (SAS) token to connect to the Azure storage account. It abstracts the details of the connection string or SAS token from the user, preventing exposure of access credentials. With the registered datastore, we can mount the blob storage container as a file system, and access the data directly from the workspace. The datastore can be shared across multiple experiments, scripts, and compute targets.

Answer C: Register the blob storage as a Dataset When we register a dataset, we can specify the path to the data in the blob storage container and the datastore where the data resides. The dataset definition contains metadata such as the file format, schema, and data distribution. With the registered dataset, we can use it as an input to the training script or pipeline. The dataset can be versioned, tagged, and tracked, providing reproducibility and auditability of the data used for the ML experiment.

Answer A: Describe the connection data in the training script This option is not recommended because it exposes the connection string or SAS token in the training script, which can be easily accessed by anyone who has access to the script. This violates the security principle of least privilege and increases the risk of unauthorized access or data leakage.

Answer D: Register the blob storage using an Estimator This option is not applicable because the estimator is used to define the ML model, not the data source. It specifies the training script, environment, compute target, and hyperparameters for the model training, but not the input data.