You work for a manufacturing company that produces retail apparel, such as shoes, dresses, blouses, etc.
Your head of manufacturing has asked you to use your data science skills to determine which product, among a list of potential next products, your company should invest its resources to produce.
You decide that you need to predict the sales levels of each of the potential next products and select the one with the highest predicted purchase rate.
Which type of machine learning approach should you use?
Click on the arrows to vote for the correct answer
A. B. C. D.Answer: C.
Option A is incorrect.
This is not a multiple classification problem that you are trying to solve for more than two outcomes.
So, a multinomial logistic regression would be the wrong choice for your machine learning model.
Option B is incorrect.
You are trying to solve a numeric result: the number of purchases customers will make for each next potential product.
From the Amazon SageMaker developer guide titled How RCF Works: “Amazon SageMaker Random Cut Forest (RCF) is an unsupervised algorithm for detecting anomalous data points within a dataset.”
Option C is correct.
You are trying to solve a numeric result: the number of purchases customers will make for each next potential product.
This numeric result case calls for the use of a regression model such as the linear regression model.
Option D is incorrect.
This is not a classification problem where you're solving a binary (yes or no) result.
So, a logistic regression model would be the wrong choice for your machine learning.
Reference:
Please see this AWS overview of machine learning concepts: https://docs.aws.amazon.com/machine-learning/latest/dg/machine-learning-concepts.html, and the Amazon Machine Learning developer guide titled: Types of ML Models.
The task at hand involves predicting the sales levels of potential next products and selecting the one with the highest predicted purchase rate. The question is which machine learning approach is suitable for this task.
Option A suggests using multinomial logistic regression to solve a multiclass classification problem. However, this approach is not suitable because the problem is not a classification problem, and the output is not a categorical variable.
Option B suggests using the random cut forest model, which is a machine learning algorithm used for anomaly detection and regression tasks. However, this approach is not suitable for the current problem as it is not an anomaly detection problem, and the task is to predict the purchase rate of potential next products, which is a regression task.
Option C suggests using a linear regression model to solve a regression problem, which is suitable for the current problem. Linear regression is a supervised learning algorithm used to model the relationship between a dependent variable (in this case, the purchase rate of potential next products) and one or more independent variables (in this case, features of the potential next products).
Option D suggests using a logistic regression model, which is a binary classification algorithm, to solve the problem. However, this approach is not suitable because the output is not a binary variable.
Therefore, the correct answer is option C: You are trying to solve for the greatest number of sales across the potential next products. Therefore, you are solving a regression problem, and you should use a linear regression model.