PMG Group Malaysia | AWS ML for Analyzing Book Reviews

PMG Bookstores and AWS ML for Analyzing Book Reviews

Question

PMG Group Malaysia is a Chinese group of companies best known for its book retailing and online retailing services as well as being involved in the printing, publishing and supply of books and library services in China and Taiwan.

PMG Bookstores currently has 65 outlets in China and 6 in Taiwan. The management team has been strengthened to improve its customer service and its range of books.

Steps have been taken to upgrade the computer system to improve the efficiency of PMG Bookstores' inventory control and customer service delivery.

PMG Bookstores continues to seek choice locations for new outlets in China.

PMG Group hosts their web application to sell the books and improve web sales.

The application is built on AWS running out EC2 and RDS. PMG Group understand recently an avalanche of negative reviews about some of the books released in market and is interested to know whether the reviews provided are by customers or bots.

PMG identifies AWS ML to provide a quick turnaround.

Please advice.

Select 3 options.

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D. E. F. G.

Answer : A,D, E.

Option A is correct -ML models for binary classification problems predict a binary outcome.

https://docs.aws.amazon.com/machine-learning/latest/dg/types-of-ml-models.html

Option B is incorrect - ML models for multiclass classification problems allow you to generate predictions for multiple classes (predict one of more than two outcomes).

https://docs.aws.amazon.com/machine-learning/latest/dg/types-of-ml-models.html

Option C is incorrect - ML models for regression problems predict a numeric value.

https://docs.aws.amazon.com/machine-learning/latest/dg/types-of-ml-models.html

Option D is correct - Amazon ML provides an industry-standard accuracy metric for binary classification models called Area Under the (Receiver Operating Characteristic) Curve (AUC).

https://docs.aws.amazon.com/machine-learning/latest/dg/binary-model-insights.html

Option E is correct - Cross-validation is a technique for evaluating ML models by training several ML models on subsets of the available input data and evaluating them on the complementary subset of the data.

Use cross-validation to detect overfitting.

https://docs.aws.amazon.com/machine-learning/latest/dg/cross-validation.html

Option F is incorrect - The macro-average F1 score is used to evaluate the predictive accuracy of a multiclass metric.

https://docs.aws.amazon.com/machine-learning/latest/dg/multiclass-model-insights.html

Option G is incorrect -For linear regression tasks, Amazon ML uses the industry standard root mean square error (RMSE) metric.

https://docs.aws.amazon.com/machine-learning/latest/dg/regression-model-insights.html

Based on the scenario provided, PMG Group wants to identify whether the negative book reviews are provided by customers or bots. To achieve this, PMG Group decided to use AWS Machine Learning (ML) service.

Option A: Amazon ML uses logistic regression algorithm through Binary classification to solve the business problem

Logistic regression is a type of supervised learning algorithm used for binary classification problems. It tries to predict the probability of an event occurring based on the input features. In this case, the event is whether the review is provided by a customer or bot. Therefore, Option A seems to be a relevant choice.

Option B: Amazon ML uses multi-nominal logistic regression algorithm through multi-class classification to solve the business problem

Multi-nominal logistic regression is also a type of supervised learning algorithm used for multi-class classification problems. It tries to predict the probability of an event occurring for each class. However, in this scenario, the business problem is a binary classification problem. Therefore, Option B does not seem to be relevant.

Option C: Amazon ML uses linear regression algorithm through regression model to solve the business problem

Linear regression is a type of supervised learning algorithm used for regression problems where the output is a continuous variable. In this scenario, the business problem is a binary classification problem. Therefore, Option C does not seem to be relevant.

Option D: Amazon ML uses Area Under the (Receiver Operating Characteristic) Curve (AUC) to provide accuracy of the model

AUC is a popular metric used to evaluate binary classification models. It measures the area under the ROC curve, which is a plot of the true positive rate against the false positive rate for different classification thresholds. A higher AUC value indicates better model performance. Therefore, Option D seems to be a relevant choice for evaluating the model's accuracy.

Option E: Cross-validation is a technique for evaluating ML models by training several ML models on subsets of the available input data to detect overfitting which eventually fails to generalize the pattern

Cross-validation is a technique used to evaluate the performance of machine learning models. It involves splitting the data into training and testing sets multiple times and evaluating the model's performance on each split. It helps to detect overfitting, which occurs when a model performs well on the training data but poorly on the testing data. While cross-validation is a useful technique for evaluating models, it is not directly related to the business problem described in the scenario.

Option F: Amazon ML uses macro-average F1 score to provide accuracy of the model

F1 score is a popular metric used to evaluate binary classification models. It is the harmonic mean of precision and recall, and it measures the model's balance between precision and recall. Macro-average F1 score calculates the F1 score for each class and takes the average. It is useful when there is class imbalance in the data. However, the scenario does not mention any class imbalance, and the business problem is a binary classification problem. Therefore, Option F does not seem to be a relevant choice.

Option G: Amazon ML uses standard root mean square error (RMSE) metric to provide accuracy of the model.

RMSE is a popular metric used to evaluate regression models. It measures the average distance between the predicted and actual values. In this scenario, the business problem is a binary classification problem, not a regression problem. Therefore, Option G does not seem to be a relevant choice.

In conclusion, the relevant options for solving the business problem described in the scenario are A, D, and E. Option A suggests using logistic regression algorithm through binary classification to solve the business problem, Option D suggests using AUC to evaluate the model's accuracy, and Option E explains cross-validation as a technique for evaluating ML models.