You work as a machine learning specialist for a real estate company.
Your company wishes to have you develop a model that predicts if a given property is in a “high value” neighborhood (properties with a median household value at or above $180,000)
Your real estate agents will use this model to prioritize their sales work based on potential commission for any given property in their list of potential sales leads.
Which option is the best approach to solve this problem?
Click on the arrows to vote for the correct answer
A. B. C. D.Correct Answer: B.
Option A is incorrect.
We are solving a classification problem: predict if a given property is in a “high value” neighborhood.
This is a discrete objective, which is suited for a classification solution, not a regression solution.
The mean square error, cross-entropy loss, and absolute error objectives are used for regression problems.
Option B is correct.
We are solving a classification problem: predict if a given property is in a “high value” neighborhood.
Therefore, we will want to optimize using discrete objects such as F1, precision, recall, or accuracy.
Option C is incorrect.
This option describes using continuous objectives (mean square error, cross-entropy loss, or absolute error) to solve a classification problem: predict whether or not a district is "high value."
Option D is incorrect.
This option describes using discrete objectives (F1, precision, recall, or accuracy) to solve a regression problem: predict the median household value for each district.
References:
Please see the Kaggle challenge titled Ethical ML: California Housing Classification (https://www.kaggle.com/c/ethicalml-cahousing/overview),
The Amazon SageMaker developer guide titled Linear Learner Algorithm (https://docs.aws.amazon.com/sagemaker/latest/dg/linear-learner.html)
Option B is the best approach to solve this problem.
Explanation: The problem statement requires the machine learning model to predict if a given property is in a “high value” neighborhood, which is defined as properties with a median household value at or above $180,000.
Since the output variable is binary, either "high value" or "not high value", this is a classification problem. Therefore, it is best to use a model that is optimized for a discrete objective suited for classification, such as F1, precision, recall, or accuracy. These metrics measure the model's ability to correctly identify positive and negative instances and are well suited for binary classification problems.
Option A suggests using SageMaker Linear Learner optimizing for a continuous objective, such as mean square error, cross-entropy loss, or absolute error to predict the median household value for each district. This is not an optimal solution for the problem statement because the problem is not asking to predict the exact median household value, but to classify whether a property is in a "high value" neighborhood or not.
Option C suggests using SageMaker Linear Learner optimizing for a continuous objective to predict whether or not a district is "high value". However, as explained earlier, the output variable is binary, so it is best to use a model optimized for a discrete objective suited for classification.
Option D suggests using SageMaker Linear Learner optimizing for a continuous objective, such as F1, precision, recall, or accuracy to predict the median household value for each district. This is also not an optimal solution for the problem statement because the problem is not asking to predict the exact median household value, but to classify whether a property is in a "high value" neighborhood or not.
Therefore, option B is the best approach to solve this problem because it suggests using SageMaker Linear Learner optimizing for a discrete objective suited for classification, such as F1, precision, recall, or accuracy to predict whether or not a district is "high value".