Naive Bayes Explained
Multinomial Naïve Bayes
Let us see how naïve bayes works by performing a movie review sentimental analysis task.
Let’s say we have a small movie review dataset which looks like this:
Movie Review |
Sentiment |
Review 1 |
Good |
Review 2 |
Bad |
Review 3 |
Good |
Review 4 |
Good |
Review 5 |
Bad |
Now I want to use Naïve Bayes algorithm to predict the sentiment of the movie review “Amazing movie”
1) Find Prior Probability
First we look at the small movie review dataset and count the frequency of good and bad movies
Sentiment |
Frequency |
Good |
3 |
Bad |
2 |
Probability of good reviews
P(Good) = Number of good reviews / Total reviews = 0.6
Probability of bad reviews
P(Bad) = Number of bad reviews / Total reviews = 0.4
2) Conditional Probability
Now we count the frequency of each words in the movie review dataset and put them into a table.
Good Reviews
Words |
Frequency |
Amazing |
8 |
Romance |
3 |
Sleepy |
1 |
Movie |
3 |
Bad Reviews
Words |
Frequency |
Amazing |
2 |
Romance |
0 |
Sleepy |
6 |
Movie |
2 |
We then compute the conditional probability for each words
For example:
P(Amazing | Good) = Frequency of the word “Amazing” in good reviews / total number of words in good reviews
Word |
Frequency |
P(Word |
Good) |
Amazing |
8 |
8 / 15 = 0.5333 |
Romance |
3 |
3 / 15 = 0.2 |
Horrible |
1 |
1/15 = 0.0667 |
Movie |
3 |
3/15 = 0.2 |
We do the same for bad reviews as well
Words |
Frequency |
P(Word | Bad) |
Amazing |
2 |
2/10 = 0.2 |
Romance |
0 |
0/10=0 |
Horrible |
6 |
6/10=0.6 |
Movie |
2 |
2/10=0.2 |
3) Now we can make predict the sentiment of the review
“Amazing Movie”
P(Good | “Amazing Movie”) = P(Good) X P(Amazing | Good) X P(Movie | Good) = 0.6 X 0.5333 X 0.2 = 0.064
P(Bad | “Amazing Movie”) = P(Bad) X P(Amazing | Bad) X P(Movie | Bad) = 0.4 X 0.2 X 0.2 = 0.016
We can clearly see that P(Good | “Amazing Movie”) > P(Bad | “Amazing Movie”), so the Naïve Bayes algorithm will predict the review “Amazing Movie” as positive sentiment
Laplace Smoothing
What about the review “Horrible Romance Movie”, lets analyze the sentiment using Naïve Bayes algorithm
P(Good | “Horrible Romance Movie”)
= P(Good) X P(Horrible | Good) X P(Romance | Good) X P(Movie | Good)
= 0.6 X 0.0667 X 0.2 X 0.2
= 0.0016
P(Bad | “Horrible Romance Movie”)
= P(Bad) X P(Horrible | Bad) X P(Romance | Bad) X P(Movie | Bad)
= 0.4 X 0.6 X 0 X 0.2
= 0
So P(Good | “Horrible Romance Movie”) > P(Bad | “Horrible Romance Movie”) hence the review “Horrible Romance movie” is a positive sentiment ?!
Hmm.. something is not quite right.
The problem is that there are no “Romance” words in bad reviews so P(Romance | Bad) = 0 hence the result will always be 0.
We can address this problem by using a technique called Laplace smoothing
We increase every word frequency by 1 to get rid of the 0, then we calculate the new conditional probability.
Positive reviews
Word |
Frequency |
P(Word |
Good) |
Amazing |
8 + 1 = 9 |
9 / 19 = 0.4737 |
Romance |
3 + 1 = 4 |
4 / 19 = 0.2105 |
Horrible |
1 + 1 = 2 |
2/19 = 0.1053 |
Movie |
3 + 1 = 4 |
4/19 = 0.2105 |
Negative Reviews
Words |
Frequency |
P(Word | Bad) |
Amazing |
2 +1 = 3 |
3/14 = 0.2143 |
Romance |
0 + 1 = 1 |
1/14=0.0714 |
Horrible |
6 + 1 = 7 |
7/14=0.5 |
Movie |
2 + 1 = 3 |
3/14=0.2143 |
Let’s find the sentiment of “Horrible Romance Movie” again to see if we get a different result after Laplace Smoothing
P(Good | “Horrible Romance Movie”)
= P(Good) X P(Horrible | Good) X P(Romance | Good) X P(Movie | Good)
= 0.6 X 0.1053 X 0.2105 X 0.2105
= 0.0028
P(Bad | “Horrible Romance Movie”)
= P(Bad) X P(Horrible | Bad) X P(Romance | Bad) X P(Movie | Bad)
= 0.4 X 0.5 X 0.0714 X 0.2143
= 0.0031
Since P(Bad | “Horrible Romance Movie”) > P(Good | “Horrible Romance Movie”), hence the review “Horrible Romance Movie” will be classified as negative review which is correct.
Gaussian Naïve Bayes
We use Gaussian Naïve Bayes to find the conditional probability of continuous features (e.g: weight, height).
We assume that these features are normally distributed (aka Gaussian distribution – hence the name)
Let’s use Gaussian Naïve Bayes to classify whether a person is male or female based on their height and weight. Suppose we are given a dataset of 6 person.
Height (cm) |
Weight (kg) |
Gender |
171 |
65 |
Male |
175 |
75 |
Male |
180 |
83 |
Male |
165 |
50 |
Female |
171 |
55 |
Female |
163 |
52 |
Female |
Now we want to predict whether a person with height 173cm and weight 80kg is a male or female
1) Find prior probability
P(Male) = Number of male / total number of person
= 3/6 = 0.5
P(Female) = Number of female / total number of person
= 3/6 = 0.5
2) Find the mean and variance of each feature
Mean of height (male) = 171 + 175 + 180 / 3 = 175.33
Standard Deviation of height (male) = 4.5093
Mean of height (female) = 166.33
Standard Deviation of height (female) = 4.1633
Mean of weight
(male) = 74.33
(female) = 52.33
Std Dev of weight
(male) = 9.0185
(female) = 2.5166
3) P(Male | height = 173cm, weight = 80kg)
= P(Male) X P(height=173 | Male) X P(weight=80 | Male)
=