### Kernel PCA | Machine Learning

Kernel Principal Component Analysis(Kernel PCA): Principal component analysis (PCA) is a popular tool for dimensionality reduction and feature extraction for a linearly separable dataset. But if the dataset is not linearly separable, we need to apply the Kernel PCA algorithm. It is similar to PCA except that it uses one of the kernel tricks to first map the non-linear features to a higher dimension, then it extracts the principal components as same as PCA.

Kernel PCA in Python: In this tutorial, we are going to implement the Kernel PCA alongside with a Logistic Regression algorithm on a nonlinear dataset. For this task, we will use the "Social_Network_Ads.csv" dataset. In the dataset, the features have a non-linear correlation with the dependent variable. So, we have to apply Kernel PCA to extract the independent variables. Let's have a glimpse of that dataset.

First of all, Let's import the essential libraries

```import numpy as np
import matplotlib.pyplot as plt
import pandas as pd```

Importing the dataset

`dataset = pd.read_csv('Social_Network_Ads.csv')X = dataset.iloc[:, [2, 3]].valuesy = dataset.iloc[:, 4].values`

Splitting the dataset into the Training set and Test set

`from sklearn.model_selection import train_test_splitX_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)`

Feature Scaling

`from sklearn.preprocessing import StandardScalersc = StandardScaler()X_train = sc.fit_transform(X_train)X_test = sc.transform(X_test)`

Applying Kernel PCA

`from sklearn.decomposition import KernelPCAkpca = KernelPCA(n_components = 2, kernel = 'rbf')X_train = kpca.fit_transform(X_train)X_test = kpca.transform(X_test)`

Note: Here, n_components parameter defines the number of independent variables we want in our model (here, it is two) and we choose RBF(Radial Basis Function) kernel as our kernel function.

Fitting Logistic Regression to the Training set

`from sklearn.linear_model import LogisticRegressionclassifier = LogisticRegression(random_state = 0)classifier.fit(X_train, y_train)`

Predicting the Test set results

`y_pred = classifier.predict(X_test)`

Making the Confusion Matrix

`from sklearn.metrics import confusion_matrixcm = confusion_matrix(y_test, y_pred)`

From the above confusion matrix, we can see that the model has an accuracy of 80%

Now, let's visualize both the training and test set results.

Visualising the Training set results

`from matplotlib.colors import ListedColormapX_set, y_set = X_train, y_trainX1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() + 1, step = 0.01),                    np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step = 0.01))plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape),            alpha = 0.75, cmap = ListedColormap(('red', 'green')))plt.xlim(X1.min(), X1.max())plt.ylim(X2.min(), X2.max())for i, j in enumerate(np.unique(y_set)):   plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],               c = ListedColormap(('red', 'green'))(i), label = j)plt.title('Logistic Regression (Training set)')plt.xlabel('Age')plt.ylabel('Estimated Salary')plt.legend()plt.show()`

The graph will look like the following:

Visualising the Test set results

`from matplotlib.colors import ListedColormapX_set, y_set = X_test, y_testX1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() + 1, step = 0.01),                    np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step = 0.01))plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape),            alpha = 0.75, cmap = ListedColormap(('red', 'green')))plt.xlim(X1.min(), X1.max())plt.ylim(X2.min(), X2.max())for i, j in enumerate(np.unique(y_set)):   plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],               c = ListedColormap(('red', 'green'))(i), label = j)plt.title('Logistic Regression (Test set)')plt.xlabel('Age')plt.ylabel('Estimated Salary')plt.legend()plt.show()`

The graph will look like the following: