You are a machine learning specialist working for the social media software development division of your company.
The social media features of your web applications allow users to post text messages and pictures about their experiences with your company's products.
You need to be able to block posts that contain inappropriate words quickly.
You have defined a vocabulary of words deemed inappropriate for your site. Which algorithm is best suited to your task?
Click on the arrows to vote for the correct answer
A. B. C. D.Answer: B.
Option A is incorrect.
The Multinomial Naive Bayes algorithm is best suited for document classification tasks where you wish to know the frequency of a given word from your vocabulary in your observed text.
You need to know whether a word from your vocabulary appears in the given post text or not.
Option B is correct.
The Bernoulli Naive Bayes algorithm is used in document classification tasks where you wish to know whether a word from your vocabulary appears in your observed text or not.
This is exactly what you are trying to accomplish.
You need to know whether a word from your vocabulary of inappropriate words appears in the given post text or not.
Option C is incorrect.
The Gaussian Naive Bayes algorithm works continuous values in your observations, not discrete values.
Your classification problem uses discrete data, the occurrence of a word or not.
Option D is incorrect.
There is no Polychoric Naive Bayes algorithm.
Reference:
Please see the DatumBox page titled Machine Learning Blog & Software Development News (http://blog.datumbox.com/machine-learning-tutorial-the-naive-bayes-text-classifier/), the SebastianRaschka page titled Naive Bayes and Text Classification - Introduction and Theory (http://sebastianraschka.com/Articles/2014_naive_bayes_1.html#3_3_multivariate), the Medium page titled Naive Bayes Classifier (https://towardsdatascience.com/naive-bayes-classifier-81d512f50a7c), the Packt page titled Machine Learning Algorithms: Implementing Naive Bayes with Spark MLlib (https://hub.packtpub.com/machine-learning-algorithms-naive-bayes-with-spark-mllib/), the Wikipedia article page titled Naive Bayes classifier (https://en.wikipedia.org/wiki/Naive_Bayes_classifier)
The algorithm that is best suited to block posts containing inappropriate words quickly, based on the defined vocabulary of words, is the Multinomial Naive Bayes algorithm.
The Naive Bayes algorithm is a probabilistic machine learning algorithm used for classification tasks. It uses Bayes' theorem to calculate the probability of a particular outcome given a set of input variables.
The Multinomial Naive Bayes algorithm is a variation of the Naive Bayes algorithm that is used when the input data consists of counts. It is often used for text classification tasks where the input data consists of word counts or frequency distributions.
In the context of the given problem, the vocabulary of words deemed inappropriate for the site can be used to create a set of features or input variables for the algorithm. Each post can be represented as a bag of words or a frequency distribution of words. The Multinomial Naive Bayes algorithm can then be trained on a set of labeled data (i.e., posts that are either inappropriate or not) to learn the probability distribution of each word given the class label.
Once the algorithm is trained, it can be used to classify new posts as either inappropriate or not based on the frequency distribution of words in the post. If the probability of the post belonging to the inappropriate class is higher than a predefined threshold, the post can be flagged as inappropriate and blocked.
In summary, the Multinomial Naive Bayes algorithm is best suited for the given problem because it can efficiently classify text data based on the frequency distribution of words, making it well-suited for the task of blocking posts that contain inappropriate words quickly.