Predicting Car Sales: ML Model Training for City-Specific Relationships

Features and Feature Crosses for Car Type and Sales

Question

You are an ML engineer at a global car manufacture.

You need to build an ML model to predict car sales in different cities around the world.

Which features or feature crosses should you use to train city-specific relationships between car type and number of sales?

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

C.

To train a city-specific model to predict car sales, we need to use features that are indicative of the city's characteristics and consumer behavior. The given options provide different ways of combining the latitude, longitude, and car type information to create the relevant features for the model.

Option A suggests using three individual features - binned latitude, binned longitude, and one-hot encoded car type. This approach allows the model to learn the independent effects of latitude, longitude, and car type on sales. The binned latitude and longitude allow for a more coarse-grained location information while one-hot encoding the car type can represent the different models or brands sold in that city.

Option B proposes creating a feature by taking the element-wise product of latitude, longitude, and car type. This approach combines all three features into a single feature, which can capture the interaction between the three features. However, this approach might be too complex and might not generalize well to new cities.

Option C suggests creating a feature by taking the element-wise product of binned latitude, binned longitude, and one-hot encoded car type. This approach is similar to option B but uses the binned features, which may make the model more robust to outliers and reduce the complexity of the model.

Option D proposes creating two feature crosses by taking the element-wise product of binned latitude and one-hot encoded car type and binned longitude and one-hot encoded car type. This approach allows the model to learn the interaction between latitude and car type and longitude and car type independently, which can help capture the city-specific characteristics and consumer preferences.

In conclusion, all the given options have their own advantages and disadvantages. The best approach depends on the nature of the data, the complexity of the problem, and the availability of domain knowledge. However, options A and D seem to be reasonable choices, as they allow the model to learn both the independent effects and interaction between the features, while keeping the model interpretable.