Selecting the Best Scoring Technique for Cross Validation in Model Evaluation

Scikit-learn Model Selection: Choosing the Right Scoring Technique

Question

You work as a machine learning specialist for a retail marketing firm.

You are responsible for the machine learning models used for product marketing.

Your latest assignment has you building a model to predict whether or not a particular marketing campaign will benefit from social media advertising.

You have gathered your social media and product marketing data and selected your model algorithm.

You are now in the process of evaluating your model.

You are using the scikit-learn model_selection package in a pipeline using cross validation to select the best performing model.

When setting the parameters for your scoring your cross validation runs, which scoring technique should you use?

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

Correct Answer: A.

Option A is correct.

The f1 scoring metric is used for binary targets.

Your target is binary: predict whether or not a particular marketing campaign will benefit from social media advertising.

Option B is incorrect.

The adjusted_mutual_info_score metric is used in clustering problems.

You are not solving a clustering problem.

Option C is incorrect.

The rand_score metric is used in clustering problems.

You are not solving a clustering problem.

Option D is incorrect.

The completeness_score metric is used in clustering problems.

You are not solving a clustering problem.

Reference:

Please see the Kaggle article titled Cross-Validation (https://www.kaggle.com/alexisbcook/cross-validation), the Scikit-learn page titled 3.3

Metrics and scoring: quantifying the quality of predictions (https://scikit-learn.org/stable/modules/model_evaluation.html), the Scikit-learn page titled sklearn.metrics.adjusted_mutual_info_score (https://scikit-learn.org/stable/modules/generated/sklearn.metrics.adjusted_mutual_info_score.html#sklearn.metrics.adjusted_mutual_info_score), the Scikit-learn page titled sklearn.metrics.rand_score (https://scikit-learn.org/stable/modules/generated/sklearn.metrics.rand_score.html#sklearn.metrics.rand_score), the Scikit-learn page titled sklearn.metrics.completeness_score (https://scikit-learn.org/stable/modules/generated/sklearn.metrics.completeness_score.html#sklearn.metrics.completeness_score), the Scikit-learn page titled sklearn.metrics.f1_score (https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html#sklearn.metrics.f1_score)

When evaluating a machine learning model using cross-validation, it's important to select an appropriate scoring technique that can measure the performance of the model effectively.

Among the given options, "f1" score is the most appropriate scoring technique to use. The "f1" score is a measure of a model's accuracy that balances precision and recall. This score considers both false positives and false negatives and can be used for binary and multi-class classification problems. A higher f1 score indicates a better model performance.

Adjusted mutual information score, rand score, and completeness score are clustering evaluation metrics, which are used to evaluate the performance of unsupervised learning models. These metrics are not relevant in the context of evaluating a supervised machine learning model for classification.

In conclusion, the best scoring technique to use in this scenario would be the "f1" score.