Machine Learning Analysis for Social Media Posts | AWS Certified Big Data Specialty Exam

Next Step for Accurate Results

Prev Question Next Question

Question

A company decides to use the Amazon Machine Learning service to classify social media posts that mention your company into two categories: posts that require a response and posts that do not.

The training dataset of 10,000 posts contains the details of each post, including the timestamp, author, and full text of the post.

You are missing the target labels that are required for training.

What should be the next step to ensure you get the right results from the Machine Learning analysis?

Answers

A. Use the Regression Model of the Machine Learning service to classify the media posts

B. Use the Binary Model of the Machine Learning service to classify the media posts to classify into 2 categories, basically those that require a response and those that don’t

C. Ensure a team is assigned to review each post and provide the label.

D. Using the a priori probability distribution of the two classes, use Monte-Carlo simulation to generate the labels.

Show Answer

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

The correct answer is C. Ensure a team is assigned to review each post and provide the label.

Explanation:

In this scenario, the training dataset does not contain the target labels required for training. In other words, it is missing the information about whether a post requires a response or not.

To address this issue, we need to obtain the target labels so that we can train a supervised learning model. The most reliable way to obtain the target labels is to manually label each post in the training dataset.

Option A, using a regression model, is not appropriate in this scenario because it is designed for predicting continuous values, not for classifying data into discrete categories.

Option B is the correct answer because it suggests using a binary classification model to classify media posts into two categories: those that require a response and those that do not. This is an appropriate approach because the problem involves classifying data into two categories.

Option D is not appropriate because Monte-Carlo simulation generates random variables using probability distributions. However, in this case, the probability distribution of the two classes is unknown, and it is not appropriate to rely on randomness to generate the labels.

Therefore, the best approach to ensure the right results from the Machine Learning analysis is to assign a team to review each post and provide the label. This will ensure that the target labels are accurate and reliable, which will lead to better model performance.

Prev Question Next Question