Naive Bayes Classification | Machine Learning


In this tutorial, we are going to learn the intuition behind the Naive Bayes classification algorithm and implement it in Python.

Naive Bayes Intuition: It is a classification technique based on Bayes Theorem. In simple terms, it is a probabilistic classifier which assumes that the presence of a particular feature in a class is not related to the presence of other features. It calculates the Posterior probability of all the events using the Bayes Theorem. Then it takes the event that has the maximum posterior probability.


To clear the idea, lets have a look at the Bayes Theorem.


             LOXMq5KZhQ-bxIqSeNps9oOKAmxCUMNwUmqdqekEnvQu65Gq7wOpDXEFeAepoSZoTGRUvgZLQsmoyzK6udLnNJBy0E_uMd3f7nMr0VRiCsW6XCyAXalgpIkBHRUhe2m6kcKTs6yn


Here,
P (H | D) = The conditional probability for event H to occur given that event D has occurred. This is known as the Posterior probability.

P (H) and P (D) = The probability of event H and D to occur without depending on each other.

P (D | H) = The probability of event D to occur given that even H has occurred. It is also known as the likelihood of the event.


Now lets get into work. We will apply this Bayes Theorem to make our Naive Bayes classifier. Lets take an example.


                                21_bayes_2

Here we have 30 data points(excluding the grey one) of people with two features Salary and Age, and their choice whether they walk or drive based on these two distinctive features. Now, for a new data point(the grey data point), we need to classify which class does it belong. That means we should find whether this new person walks or drive. For this, we will apply the Naive Bayes technique to take the decision.


First of all, we will apply Bayes Theorem to calculate the posterior probability of walking for this new data point based on the given features X. That is how likely the person walks.

                                           21_bayes_3


In the same way, we will calculate the probability of driving

                                            21_bayes_4

After calculating both the probabilities, the algorithm will compare them, and take the one that has the highest value.

                                            21_bayes_5

Step 1: Now we will calculate all the prior probability, marginal likelihood, likelihood, and posterior probability of a person likely to walk.

The prior probability, P(Walks) is simply the probability of the persons who walk among all the people.                                     17_naive_bayes_6For marginal likelihood, P(X), we will make a circle around the new data point and calculate all the observations (including red and green). The radius of the circle depends upon you. That means you can take different radii depending on the algorithm.             17_7_naive_bayes

The likelihood is the probability of such persons who walk to work. So, here we are concerned only with the red dots.                           17_8_naive_bayes



After calculating all these, now we can put them into the Bayes' Theorem

                             17_16_naive_bayes

Step 2: Now, we will do similar calculations for P(Drives | X)


                   17_11_naive_bayes

17_12_naive_bayes

17_13_naive_bayes

Putting all these together, we get

                       17_15_naive_bayes_LI

                      

   

Step 3: Now we will compare both the probabilities. Then we will take the higher probability value as the output.

                           17_17_naive_bayes

Here we can see the probability of a person likely to walk is greater than the probability for a person to drive. So we say that our new point falls into the category of people who walks.

Naive Bayes in Python: Now, we will implement the algorithm in Python. For this task, we will use the Social_Network_Ads.csv dataset. Lets have a glimpse of that dataset-

                                                    

You can download the whole dataset from here.

First of all, we will import all the essential libraries.

# Importing essential libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

Now, we will import the dataset to our program

# Importing the dataset
dataset = pd.read_csv('Social_Network_Ads.csv')

From the dataset, we take the Age and EstimatedSalary columns in the Feature matrix as they are independent features and the Purchased column in the Dependent vector. 

# Making the Feature matris and dependent vector
X = dataset.iloc[:, [2, 3]].values
y = dataset.iloc[:, 4].values

Now, we split our dataset into training and test sets.

# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0) 

We need to scale the training and test sets to get a better prediction.

#Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

Now, we will fit the Naive Bayes algorithm to our dataset. 

# Fitting Naive Bayes to the Training set
from sklearn.naive_bayes import GaussianNB
classifier = GaussianNB()
classifier.fit(X_train, y_train)

Its time to see how our model predicts the test set result.

# Predicting the Test set results
y_pred = classifier.predict(X_test)

We will see the performance of our model using the confusion matrix.

# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)


efMXgW_krwmuDTt_orKFbkJuQybLVHKU4c_R3nAEWSy4CdS0g6C6p2yTy0V8qpzR-1CujXqT8PLyNgiKswv-W5OKuGrTQ3_aC684XfNiuoQuExCKMpzp26Kuo8TyrIlT-vjy3jtL

Now, we will visualize our model with the training set result.

# Visualising the Training set results
from matplotlib.colors import ListedColormap
X_set, y_set = X_train, y_train
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() + 1, step = 0.01),
                     np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step = 0.01))
plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape),
             alpha = 0.75, cmap = ListedColormap(('red', 'green')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
    plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
                c = ListedColormap(('red', 'green'))(i), label = j)
plt.title('Naive Bayes (Training set)')
plt.xlabel('Age')
plt.ylabel('Estimated Salary')
plt.legend()
plt.show()

                         21_bayes_13

We will now see how it performs on our test set. Lets visualize this.

# Visualising the Test set results
from matplotlib.colors import ListedColormap
X_set, y_set = X_test, y_test
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() + 1, step = 0.01),
                     np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step = 0.01))
plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape),
             alpha = 0.75, cmap = ListedColormap(('red', 'green')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
    plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
                c = ListedColormap(('red', 'green'))(i), label = j)
plt.title('Naive Bayes (Test set)')
plt.xlabel('Age')
plt.ylabel('Estimated Salary')
plt.legend()
plt.show()

                            21_bayes_14