You work as a machine learning specialist for a video game software company.
You have been asked to produce a machine learning model that predicts whether a newly released game will eventually become a successful product that earns the company profits.
Your data used for your model is product information and product ratings from social media.
Your management team would like to use your model results to help them decide if a new game is worth investing in marketing dollars to promote the game further.
Which model and objective will best match your model requirements?
Click on the arrows to vote for the correct answer
A. B. C. D.Correct Answer: B.
Option A is incorrect.
XGBoost is a good choice for your algorithm, but the multi-softmax objective is used for multiclass classification.
You are trying to predict whether your newly released game will eventually succeed in making your company money or not; a binary or logistic regression problem.
Option B is correct.
XGBoost is a good choice for your algorithm, and the binary:logistic objective is the correct objective since it is used for binary classification problems.
You are trying to predict whether your newly released game will eventually succeed in making your company money or not; a binary or logistic regression problem.
Option C is incorrect.
The DeepAR algorithm is not the correct choice for your algorithm.
The DeepAR algorithm is used with time-series data.
You are using product information and product ratings from social media.
Also, there is no reg:logistic objective for the DeepAR algorithm.
Option D is incorrect.
The Random Cut Forest algorithm is an unsupervised algorithm used to detect anomalous data points in a data set.
You would not try to use the Random Cut Forest algorithm to solve a logistic regression problem like predicting whether your newly released game will eventually succeed in making your company money or not.
References:
Please see the AWS Amazon SageMaker Examples jupyter notebook titled Predicting Product Success When Review Data Is Available (https://github.com/aws/amazon-sagemaker-examples/blob/master/introduction_to_applying_machine_learning/video_game_sales/video-game-sales-xgboost.ipynb),
The Amazon SageMaker developer guide page titled XGBoost Hyperparameters (https://docs.aws.amazon.com/sagemaker/latest/dg/xgboost_hyperparameters.html),
The Amazon SageMaker developer guide page titled Random Cut Forest (RCF) Algorithm (https://docs.aws.amazon.com/sagemaker/latest/dg/randomcutforest.html),
The Amazon SageMaker developer guide page titled DeepAR Forecasting Algorithm (https://docs.aws.amazon.com/sagemaker/latest/dg/deepar.html),
The Amazon SageMaker GitHub repository titled XGBoost Parameters (https://github.com/dmlc/xgboost/blob/master/doc/parameter.rst#learning-task-parameters),
The Wikipedia page titled Logistic regression (https://en.wikipedia.org/wiki/Logistic_regression#:~:text=Logistic%20regression%20is%20a%20statistical,a%20form%20of%20binary%20regression)
Option B, XGboost with binary:logistic objective, is the best model and objective combination for this scenario.
XGboost is a popular machine learning algorithm that is known for its accuracy and speed in training. It can handle both classification and regression problems and is particularly effective when the dataset has a large number of features.
Binary:logistic is an objective function in XGboost that is suitable for binary classification problems. In this case, the objective is to predict whether a newly released game will eventually become a successful product or not, which is a binary classification problem. The binary:logistic objective will output probabilities of success or failure, and a threshold can be set to convert these probabilities into binary decisions.
DeepAR is a neural network-based algorithm designed for time-series forecasting. Although it can be useful for forecasting the success of a product over time, it is not the best choice for this scenario since the data used is not a time series.
Random Cut Forest is an unsupervised learning algorithm used for anomaly detection. It is not suitable for this scenario since the objective is to predict whether a game will be successful or not, which is a supervised learning problem.
Multi-softmax is an objective function that is suitable for multi-class classification problems. However, this scenario only has two classes (success or failure), so this objective function is not the best choice.
In summary, option B, XGboost with binary:logistic objective, is the best choice for this scenario since it is an accurate and fast algorithm that is suitable for binary classification problems.