House Price Prediction Model Evaluation Metrics for Regression Problem | MLS-C01 Exam

Choose the Right Regression Evaluation Metric for Your House Price Prediction Model

Question

You work for a real estate e-commerce company.

Your machine learning team is building a house price prediction model to be used on your company's site.

This model will be used as a guide to users as an unbiased objective estimate of a given house's value.

Your company has gathered an enormous dataset of house observations from across the United States.

The observations in the dataset are categorized by region of the country.

The housing data prices are mainly clustered by region across the dataset.

However, each region has several outlier priced houses. Since you have defined the housing price prediction work as a regression problem, you have selected the XGBoost SageMaker built-in algorithm to base your model.

You are now ready to do your hyperparameter tuning.

So you need a good regression evaluation metric.

Which of the following evaluation metrics best fit your problem?

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

Answer: D.

Option A is incorrect.

The MSE metric is useful for measuring regression problems.

However, it does not handle outliers as well as the MAE metric.

Your dataset has several outliers per region.

Option B is incorrect.

The AUC metric is best used for classification type machine learning algorithms.

You are using a regression algorithm.

Option C is incorrect.

The AUC metric is best used for classification type machine learning algorithms.

You are using a regression algorithm.

Option D is correct.

The MAE is the correct regression metric to use when outliers can significantly influence your dataset.

Your dataset contains several outliers per region.

Reference:

Please see the article titled 20 Popular Machine Learning Metrics.

Part 1: Classification & Regression Evaluation Metrics, the Amazon SageMaker developer guide titled XGBoost Algorithm, and the Amazon SageMaker developer guide titled Tune an XGBoost Model.

The best evaluation metric for a regression problem, such as house price prediction, is typically the Mean Squared Error (MSE) evaluation metric.

MSE measures the average of the squared differences between the predicted and actual values. By squaring the differences, the metric puts more emphasis on larger errors, which is particularly useful in scenarios where the model needs to avoid large errors on high-priced houses.

In this specific case, since the housing data prices are mainly clustered by region across the dataset, the model needs to be able to predict accurately not only the average price but also the price range for each region, which is where MSE can be particularly helpful.

AUC and ROC curve are typically used for binary classification problems, and are not well-suited for regression problems such as house price prediction.

MAE, on the other hand, measures the absolute differences between the predicted and actual values, which can be useful in certain scenarios. However, since the housing data has outliers, MSE is a more appropriate metric as it will put more emphasis on the errors from these outliers, which are particularly important to avoid.

Therefore, the best evaluation metric for this problem is MSE.