CS460/660- Machine Learning

Naive Bayes Classifier | October 2020

What is Naive Bayes?

Naive Bayes is a classification technique based on Bayes theorem with an assumption of independence among predictors. It comprises of two parts Naive and Bayes. In simple terms this theorem assumes presence of a particular feature in a class is unrelated to presence of any other feature, even if these features depend upon eachother or the existence of the other features. All of these properties independently contribute to the probability. This why we call it Naive Bayes.

Bayes' Theorem and it's use

In probability theory and statistics, Bayes Theorem is alternatively known as the Bayes law or Bayes rule. This describes the probability of an event based on the prior knowledge of the conditions that might be realted to the event. Bayes theorem is a way to find out the conditional probability. The conditional probab is the probab of the event happening, given that it has some relationship with one or more events. For example, probability of getting a parking space is connected to the time of the day you park, where you park and few other things. This theorem is slightly more nuanced. In a nutshell it gives you an actual probability of an event given information about the tests.

Definition: Given a hypothesis H and evidence E, Bayes' theorem states that the relationship between the probability of the hypothesis before getting the evidence P(H) and the probability of the hypothesis after getting the evidence P(H|E) is:

$P (H | E) = \frac{P (E | H) . P (H)}{P (E)}$

Example: Suppose I have a deck of cards and if a single card is drawn then probability that the card will be a king is simply given by 4/52. If king is an event then P(king) = 4/52 = 1/13. Evidence is provided for instance someone looks as the card is the single card is the face card. The probability that the card is king, using Bayes theorem comes out to be,

$P (k i n g | F a c e) = \frac{P (F a c e | k i n g) . P (k i n g)}{P (F a c e)}$

As every king card is a face card thus P(Face|king)=1 and we have 3 type of face cards(jack, queen and king), thus P(Face)=12/52=3/13. Using bayes theroem we can find the probability of king given the card is a face card. So our final answer will be, P(king|Face)=1/3.

To understand this in other words we can say as we have only face cards and for a standard deck there are 3 types of face cards and thus choosing one of them has a probability of 1/3.

Proof for Bayes' Theorem: The conditional probability's are defined as following relations,

$P (A | B) = \frac{P (A \cap B)}{P (B)}$ and $P (B | A) = \frac{P (A \cap B)}{P (A)}$

If we equate $P (A \cap B)$ from both these equations, we will get,

$P (A | B) = \frac{P (B | A) . P (B)}{P (A)}$

To get the expression from our definition we simply have to replace A with H and B with E. Basically we managed to relate P(H) to the probability of H after getting an evidence E(=P(H|E)) and using this theorem we are checking how probable is our hypothesis given the observed evidence.

Mathematical Working of Naive Bayes

Now, lets try to implement the formulas in real life scenarios.

Consider the case shown in above figure. On the left side of the image we have a dataset where we have Outlook, Humidity, Wind and whether we should play or not and few corresponding options to each of them. On the right side of the figure we have the frequency tables for each atribute of the dataset. Now, for each frequency table we generate a likelihood table.

In the above figure we have the likelihood table for Outlook. This table contains the probability of a particular day. For example consider the sunny day and check yes or no. Using Bayes theorem we can get the likelihood of Yes and No, given sunny as,

P(c|x) = P(Yes|Sunny) = P(Sunny|Yes) * P(Yes)/P(Sunny) = (0.3×0.71)/0.36 = 0.591

P(c|x) = P(No|Sunny) = P(Sunny|No) * P(No)/P(Sunny) = (0.4×0.36)/0.36 = 0.400

Similarly we can create the likelihood table for humidity and the wind.

Fig. Likelihood Table of Humidity and Wind

Industrial use:

1. Classification and categorization of news into few subsections such as National, International, Sports, Media, Travel & lifestyle, Stock market etc.

2. Spam detection of emails

3. Object and Face Reognition

4. Weather Prediction

Types of Naive Bayes

1. Gaussian: Assumes the feature follow a normal distribution.

2. Multinomial: Used for discrete counts.

3. Bernoulli : Feature vectors are binary in this case.

Steps Involved in Naive Bayes

Step-1 : Handling the Data

Step-2 : Summarizing Data

Step-3 : Making a Prediction

Step-4 : Making all the predictions

Step-5 : Evaluatte Accuracy

Step-6 : Tying all Together

Basically I have divided the entire code into these 6 parts and below you can find the code which is self-explanatory.

The code:

This code window shows the python code and Accuracy of the model generated

Alternatively one can use Sklearn and instead of this long code simply use an equally efficient and shorter version of the code. Simply import GaussianNB from sklearn.naive_bayes and use the model to fit data.

References:

"Naive Bayes Classifier in Python | Naive Bayes Algorithm | Machine Learning Algorithm | Edureka" by Edureka
"Naive Bayes Classifier" at towardsscience.