# What is Bayesian Information Criterion

##### Understanding Bayesian Information Criterion

Bayesian Information Criterion (BIC) is a widely used statistical metric in machine learning and statistics. It is a tool that is used to compare several statistical models, and it is commonly used to select the most appropriate model from a set of candidates. BIC provides a precise measurement of the amount of information lost when one model is used to approximate another model.

The purpose of BIC is to identify the model that explains the data best while also being the simplest. The idea of simplicity is important because, in many cases, the simplest solution is the best solution. The simplicity referred to here is not a matter of the actual complexity of the underlying model but instead refers to the number of parameters in the model. A model with fewer parameters is considered simpler than one with more parameters.

##### BIC = -2log(L) + p*log(n)

where L represents the likelihood of the data given the model, prepresents the number of parameters in the model, and n is the number of data points in the dataset used in the model. The formula consists of two terms: the first term helps to measure how well the model fits the data, while the second term provides a penalty for the model complexity.

##### BIC Calculation Example

Let's take an example to explain the BIC formula.

Suppose we have a dataset of 100 data points, and we would like to fit a linear regression model to this data. The linear regression model has two parameters, which represent the slope and the intercept of the line. If we fit the model to the data and calculate the log-likelihood of the data given the model, we obtain a value of -50. The BIC value for this model is, therefore, -2 x (-50) + 2 x log(100) = 219.57.

If we have another model, say a quadratic regression model with three parameters, we can calculate its BIC value using the same formula. Let's suppose the BIC value of the quadratic regression model is 223.13. In this case, the linear regression model would be preferred because it has a lower BIC value, even though the quadratic regression model provides a better fit to the data.

##### Comparing Models using BIC

BIC is commonly used to compare several candidate models to determine the model that best fits the observed data. The model with the lowest BIC is considered the preferred model. When comparing models, the difference in BIC between two models is often used to determine which model provides a better fit to the data. A difference of 2 or more between two models indicates strong evidence in favor of the model with the lower BIC.