Taxi Fare Prediction Model: Feature Selection for AI-900 Exam

Feature Selection for Training a Taxi Fare Prediction Model

Question

You have a dataset that contains information about taxi journeys that occurred during a given period.

You need to train a model to predict the fare of a taxi journey.

What should you use as a feature?

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

B

The label is the column you want to predict. The identified Featuresare the inputs you give the model to predict the Label.

Example:

The provided data set contains the following columns:

vendor_id: The ID of the taxi vendor is a feature.

rate_code: The rate type of the taxi trip is a feature.

passenger_count: The number of passengers on the trip is a feature. trip_time_in_secs: The amount of time the trip took. You want to predict the fare of the trip before the trip is completed. At that moment, you don't know how long the trip would take. Thus, the trip time is not a feature and you'll exclude this column from the model. trip_distance: The distance of the trip is a feature. payment_type: The payment method (cash or credit card) is a feature. fare_amount: The total taxi fare paid is the label.

https://docs.microsoft.com/en-us/dotnet/machine-learning/tutorials/predict-prices

To train a model to predict the fare of a taxi journey, we need to identify the relevant features that may help in predicting the fare accurately. Features are the input variables that the model will use to make its predictions.

Out of the given options, the most relevant feature for predicting the fare of a taxi journey is the trip distance of individual taxi journeys (Option B).

The distance of the trip can have a significant impact on the fare of the taxi journey. Longer distances generally result in higher fares, while shorter distances have lower fares. Therefore, including trip distance as a feature in the model will likely improve the accuracy of fare predictions.

The number of taxi journeys in the dataset (Option A) is not a relevant feature for predicting the fare of a specific taxi journey. The number of taxi journeys is an overall statistic and does not provide information on the specific attributes of a given journey.

The fare of individual taxi journeys (Option C) cannot be used as a feature for predicting the fare since we are trying to predict the fare itself. The fare is the target variable that we are trying to predict, so we cannot use it as a feature.

The trip ID of individual taxi journeys (Option D) is also not a relevant feature for predicting the fare of a taxi journey. The trip ID is a unique identifier for each trip and does not provide any information that could be useful in predicting the fare of the taxi journey.

Therefore, the most relevant feature for predicting the fare of a taxi journey is the trip distance of individual taxi journeys.