Azure ML SDK: Validating Machine Learning Models on Azure

Which Parameters Are Invalid for Specifying Validation Methods?

Question

While setting up your machine learning experiments, you need to ensure that the trained models will be appropriately scored with validation data.

Azure ML SDK provides several methods to specify in your scripts how to determine the data to be used for validation.

Which is not a valid set of parameters for specifying validation method? Select two!

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D. E.

Answers: C and E.

Option A is incorrect because the primary metric must be given in all cases.

When only training data is provided, the default splitting rules will be applied.

Option B is incorrect because the primary metric is always required.

When both training and validation data are provided, these will be used, since no automatic splitting is needed.

Option C is CORRECT because primary metric and training data are always required.

Validation data either can be provided explicitly, or can be generated automatically (by default or explicit splitting rules).

Option D is incorrect because when only training data is provided, with validation set size set manually, this value will be used to split data into training and validation subsets.

Option E is CORRECT because it is a kind of redundancy because either validation data or training data with validation set size can be set.

Setting both validation data and validation set size is wrong.

Reference:

The Azure ML SDK provides several methods to specify how to determine the data used for validation in machine learning experiments. The parameters for specifying the validation method are as follows:

A. Primary metric; training data: This parameter set is invalid because it only specifies the primary metric and the training data, but does not specify any validation data.

B. Primary metric; training data; validation data: This parameter set is valid because it specifies the primary metric, the training data, and the validation data.

C. Primary metric; validation data; number of cross-validations: This parameter set is valid because it specifies the primary metric, the validation data, and the number of cross-validations.

D. Primary metric; training data; validation set size: This parameter set is valid because it specifies the primary metric, the training data, and the size of the validation set.

E. Primary metric; training data; validation data; validation set size: This parameter set is valid because it specifies the primary metric, the training data, the validation data, and the size of the validation set.

Therefore, the two invalid parameter sets for specifying the validation method are A. Primary metric; training data, and E. Primary metric; training data; validation data; validation set size. The parameter set A does not specify any validation data, while the parameter set E specifies both validation data and validation set size, which may be redundant.