Naive Bayes Explained

Chong Kok Hung

Jun 23, 2023 • 3 min read

Multinomial Naïve Bayes

Let us see how naïve bayes works by performing a movie review sentimental analysis task.

Let’s say we have a small movie review dataset which looks like this:

Movie Review	Sentiment
Review 1	Good
Review 2	Bad
Review 3	Good
Review 4	Good
Review 5	Bad

Now I want to use Naïve Bayes algorithm to predict the sentiment of the movie review “Amazing movie”

1) Find Prior Probability

First we look at the small movie review dataset and count the frequency of good and bad movies

Sentiment	Frequency
Good	3
Bad	2

Probability of good reviews

P(Good) = Number of good reviews / Total reviews = 0.6

Probability of bad reviews

P(Bad) = Number of bad reviews / Total reviews = 0.4

2) Conditional Probability

Now we count the frequency of each words in the movie review dataset and put them into a table.

Good Reviews

Words	Frequency
Amazing	8
Romance	3
Sleepy	1
Movie	3

Bad Reviews

Words	Frequency
Amazing	2
Romance	0
Sleepy	6
Movie	2

We then compute the conditional probability for each words

For example:

P(Amazing | Good) = Frequency of the word “Amazing” in good reviews / total number of words in good reviews

Word	Frequency	P(Word \| Good)
Amazing	8	8 / 15 = 0.5333
Romance	3	3 / 15 = 0.2
Horrible	1	1/15 = 0.0667
Movie	3	3/15 = 0.2

We do the same for bad reviews as well

Words	Frequency	P(Word \| Bad)
Amazing	2	2/10 = 0.2
Romance	0	0/10=0
Horrible	6	6/10=0.6
Movie	2	2/10=0.2

3) Now we can make predict the sentiment of the review

“Amazing Movie”

P(Good | “Amazing Movie”) = P(Good) X P(Amazing | Good) X P(Movie | Good) = 0.6 X 0.5333 X 0.2 = 0.064

P(Bad | “Amazing Movie”) = P(Bad) X P(Amazing | Bad) X P(Movie | Bad) = 0.4 X 0.2 X 0.2 = 0.016

We can clearly see that P(Good | “Amazing Movie”) > P(Bad | “Amazing Movie”), so the Naïve Bayes algorithm will predict the review “Amazing Movie” as positive sentiment

Laplace Smoothing

What about the review “Horrible Romance Movie”, lets analyze the sentiment using Naïve Bayes algorithm

P(Good | “Horrible Romance Movie”)

= P(Good) X P(Horrible | Good) X P(Romance | Good) X P(Movie | Good)

= 0.6 X 0.0667 X 0.2 X 0.2

= 0.0016

P(Bad | “Horrible Romance Movie”)

= P(Bad) X P(Horrible | Bad) X P(Romance | Bad) X P(Movie | Bad)

= 0.4 X 0.6 X 0 X 0.2

= 0

So P(Good | “Horrible Romance Movie”) > P(Bad | “Horrible Romance Movie”) hence the review “Horrible Romance movie” is a positive sentiment ?!

Hmm.. something is not quite right.

The problem is that there are no “Romance” words in bad reviews so P(Romance | Bad) = 0 hence the result will always be 0.

We can address this problem by using a technique called Laplace smoothing

We increase every word frequency by 1 to get rid of the 0, then we calculate the new conditional probability.

Positive reviews

Word	Frequency	P(Word \| Good)
Amazing	8 + 1 = 9	9 / 19 = 0.4737
Romance	3 + 1 = 4	4 / 19 = 0.2105
Horrible	1 + 1 = 2	2/19 = 0.1053
Movie	3 + 1 = 4	4/19 = 0.2105

Negative Reviews

Words	Frequency	P(Word \| Bad)
Amazing	2 +1 = 3	3/14 = 0.2143
Romance	0 + 1 = 1	1/14=0.0714
Horrible	6 + 1 = 7	7/14=0.5
Movie	2 + 1 = 3	3/14=0.2143

Let’s find the sentiment of “Horrible Romance Movie” again to see if we get a different result after Laplace Smoothing

P(Good | “Horrible Romance Movie”)

= P(Good) X P(Horrible | Good) X P(Romance | Good) X P(Movie | Good)

= 0.6 X 0.1053 X 0.2105 X 0.2105

= 0.0028

P(Bad | “Horrible Romance Movie”)

= P(Bad) X P(Horrible | Bad) X P(Romance | Bad) X P(Movie | Bad)

= 0.4 X 0.5 X 0.0714 X 0.2143

= 0.0031

Since P(Bad | “Horrible Romance Movie”) > P(Good | “Horrible Romance Movie”), hence the review “Horrible Romance Movie” will be classified as negative review which is correct.

Gaussian Naïve Bayes

We use Gaussian Naïve Bayes to find the conditional probability of continuous features (e.g: weight, height).

We assume that these features are normally distributed (aka Gaussian distribution – hence the name)

Let’s use Gaussian Naïve Bayes to classify whether a person is male or female based on their height and weight. Suppose we are given a dataset of 6 person.

Height (cm)	Weight (kg)	Gender
171	65	Male
175	75	Male
180	83	Male
165	50	Female
171	55	Female
163	52	Female

Now we want to predict whether a person with height 173cm and weight 80kg is a male or female

1) Find prior probability
P(Male) = Number of male / total number of person

= 3/6 = 0.5

P(Female) = Number of female / total number of person

= 3/6 = 0.5

2) Find the mean and variance of each feature

Mean of height (male) = 171 + 175 + 180 / 3 = 175.33

Standard Deviation of height (male) = 4.5093

Mean of height (female) = 166.33

Standard Deviation of height (female) = 4.1633

Mean of weight

(male) = 74.33

(female) = 52.33

Std Dev of weight

(male) = 9.0185

(female) = 2.5166

3) P(Male | height = 173cm, weight = 80kg)

= P(Male) X P(height=173 | Male) X P(weight=80 | Male)

Sign up for more like this.