PMG Group Malaysia is a Chinese group of companies best known for its book retailing and online retailing services as well as being involved in the printing, publishing, and supply of books and library services in China and Taiwan.
PMG Bookstores currently has 65 outlets in China and 6 in Taiwan. The management team has been strengthened to improve its customer service and its range of books.
Steps have been taken to upgrade the computer system to improve the efficiency of PMG Bookstores' inventory control and customer service delivery.
PMG Bookstores continues to seek choice locations for new outlets in China.
PMG Group hosts their web application to sell the books and improve web sales.
The application is built on AWS running out EC2 and RDS. PMG Group has lot of existing customers.
They launched a campaign to sell new products based on the customer's interest that can upscale the business.
Management want to know the top 50 books every day that will be sold.
Please advise.
Select 2 options.
Click on the arrows to vote for the correct answer
A. B. C. D. E.Answer : B, E.
Option A is incorrect.
ML models for binary classification problems predict a binary outcome.
https://docs.aws.amazon.com/machine-learning/latest/dg/types-of-ml-models.htmlOption B is correct.
ML models for multiclass classification problems allow you to generate predictions for multiple classes (predict one of more than two outcomes).
https://docs.aws.amazon.com/machine-learning/latest/dg/types-of-ml-models.htmlOption C is incorrect.
ML models for regression problems predict a numeric value.
https://docs.aws.amazon.com/machine-learning/latest/dg/types-of-ml-models.htmlOption D is incorrect.
Amazon ML provides an industry-standard accuracy metric for binary classification models called Area Under the (Receiver Operating Characteristic) Curve (AUC).
https://docs.aws.amazon.com/machine-learning/latest/dg/binary-model-insights.htmlOption E is correct.
Cross-validation is a technique for evaluating ML models by training several ML models on subsets of the available input data and evaluating them on the complementary subset of the data.
Use cross-validation to detect overfitting.
https://docs.aws.amazon.com/machine-learning/latest/dg/cross-validation.htmlTo determine the top 50 books that will be sold every day, we need to predict the sales volume for each book. This can be achieved through machine learning algorithms that can learn from historical sales data and identify patterns and trends.
Out of the given options, the most suitable machine learning algorithm for this use case would be option B - Logistic Regression algorithm addressing Multi-class classification. Multi-class classification is a type of machine learning algorithm that can predict the probability of an instance belonging to one of several classes. In this case, we can use this algorithm to predict the probability of each book being sold in the top 50.
Logistic Regression is a type of linear algorithm that can be used for classification problems. It models the probability of the output variable (in this case, the probability of a book being sold in the top 50) based on one or more input variables (such as historical sales data, customer data, etc.).
Logistic Regression algorithm can be trained using historical sales data, customer data, product data, and other relevant factors that can influence the sales of a book. Once trained, the model can be used to predict the probability of each book being sold in the top 50 for a given day.
Option A - Logistic Regression algorithm addressing Binary Classification is not the most suitable choice for this use case as it is used for predicting a binary output variable (yes/no, true/false, etc.), while in this case, we need to predict the probability of a book being sold in the top 50.
Option C - Linear regression algorithm addressing Regression is not the best choice for this use case as it is used for predicting continuous variables, while in this case, we need to predict the probability of a book being sold in the top 50, which is a discrete variable.
Option D - Area under the Curve (AUC) provides accuracy of the model is not an algorithm, but rather a performance metric used to evaluate the accuracy of a binary classifier. It measures the area under the receiver operating characteristic (ROC) curve, which is a plot of the true positive rate against the false positive rate at different classification thresholds. This option is not suitable for this use case, as we need to predict the probability of each book being sold in the top 50, which is a multi-class classification problem.
Option E - Cross-validation for evaluating ML models to detect overfitting is a technique used to evaluate the performance of machine learning models by dividing the data into subsets and training the model on one subset while testing it on the other. This option is not directly related to the algorithm selection, but it is a good practice to use cross-validation to evaluate the performance of the selected machine learning algorithm to avoid overfitting.
In conclusion, the most suitable options for this use case are B - Logistic Regression algorithm addressing Multi-class classification and E - Cross-validation for evaluating ML models to detect overfitting.