XGBoost | Machine Learning


XGBoost in Python Step 1: First of all, we have to install the XGBoost. Now, we need to implement the classification problem. In this problem, we classify the customer in two class and who will leave the bank and who will not leave the bank. Now, we import the library and we import the dataset churn Modeling csv file. So, we just want to preprocess the data for this churn modeling problem associated to this churn modeling CSV file. Here, XGboost is a great and boosting model with decision trees according to the feature skilling. After building the model, we can understand, XGBoost is so popular its because three qualities, first quality is high performance and second quality is fast execution speed. Now, we spliting the dataset into the training set and testing set.


import numpy as np
import matplotlib.pyplot as plt
import pandas as pd


# Importing the dataset

dataset = pd.read_csv('Churn_Modelling.csv')
X = dataset.iloc[:, 3:13].values
y = dataset.iloc[:, 13].values


# Encoding categorical data

from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_X_1 = LabelEncoder()
X[:, 1] = labelencoder_X_1.fit_transform(X[:, 1])
labelencoder_X_2 = LabelEncoder()
X[:, 2] = labelencoder_X_2.fit_transform(X[:, 2])
onehotencoder = OneHotEncoder(categorical_features = [1])
X = onehotencoder.fit_transform(X).toarray()
X = X[:, 1:]


# Splitting the dataset into the Training set and Test set

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)


XGBoost in Python Step 2: In this tutorial, we gonna fit the XSBoost to the training set. Now, we apply the xgboost library and import the XGBClassifier.Now, we apply the classifier object. And we call the XGBClassifier class. Now, we apply the fit method. Now, we execute this code. Now, we apply the confusion matrix. And we also predict the test set result. And we applying the k fold cross validation code. Now, we execute this code. After executing this code, we get the dataset. Then we get the confusion matrix, where we get the 1521+208 correct prediction and 197+74 incorrect prediction. And we get this accuracy 86%. After executing the mean function, we get 86%.

from xgboost import XGBClassifier
classifier = XGBClassifier()
classifier.fit(X_train, y_train)


# Predicting the Test set results

y_pred = classifier.predict(X_test)


# Making the Confusion Matrix

from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)


# Applying k-Fold Cross Validation

from sklearn.model_selection import cross_val_score
accuracies = cross_val_score(estimator = classifier, X = X_train, y = y_train, cv = 10)
accuracies.mean()
accuracies.std()