Logistic regression is one of the most popular algorithms for classification problems. It is called regression even though it is not a regression algorithm because the underlying technology is similar to Linear Regression. The term “logistic” comes from the statistical model used (logit model).

As seen in earlier releases, classification algorithms are used to classify the dataset into various classes; based on that, logistic classification is a type of binary classification.

Logistic regression is an extension of the linear regression model. Even though linear regression is good for regression, it does not work well for classification because the linear model doesn’t output probabilities and treats them as either 0 or 1 (Class A or Class B). Using this, it fits the dataset in a plane with each row as its point, then attempts to find the line that minimizes the distances between points and the plan. Using the information given, the linear model will try to force a weird structure between the independent and the dependent variables.

The line L1 is the best fit line when only the red points are considered, and it classified points to the right of the line as 1 and left as 0, which provides a decent indication of the data for classification even though there are a lot of wrongly classified points.

Now, if there is one point that is an extreme case (also called outlier), the best fit line transforms to L2, which is now classifying more points incorrectly to the extent where all the points that should be classified as 1 are 0. This drastic difference came just because of a single outlier.

Seeing such conditions, modifications were made to the linear regression algorithm, creating the famous logistic regression.

Instead of using a straight line in the plane logistic regression model uses the logistic function to fit the output of a linear equation between 0 and 1. Looking at the above diagram, it is evident that the S-curve created by logistic regression relates closely to the data points.

Logistic function:

When drawn on a 2-D plane, it looks like this:

In the places where X( independent variable) goes to infinity, Y (Dependent variable) goes to 1 and where X goes to negative of infinity, Y goes to 0.

This logistic function is also called a sigmoid function. It will take any real number and convert it into a probability between 0 and 1, hence great for binary classification.

Till now, there was only one independent variable (X) what will happen in a condition where there is more than one independent variable?

Then the linear equation will switch into

Here x_{1}, x_{2},…. x_{p} are all independent variables. The β values are calculated using the maximum likelihood estimation. This method checks the values of β through multiple iterations and finds the best fit of log odds, producing a likelihood function. Logistic regression works when this function is maximized, and the optimal values for the coefficients are found and then used in the sigmoid function to find the probability.

The blue line here is **y=0.5, **above which all points will be classified as class 1, and below it are class 0.

The probabilities found by the aforementioned formula are used here.

**p≥0.5,class=1**

**p<0.5,class=0**

So, the good thing about the logistic regression algorithm is that it not only classifies but also provides the probabilities. Knowing that a condition has a 90+% probability for a class compared to one with 51% is a big advantage.

**The cost function of Logistic regression:**

The cost function is a function that helps us understand how well the machine learning model works. It in itself calculates the difference between the actual and the predicted values and measures how wrong the algorithm was in prediction. By minimizing the value of the cost function most optimized result is found.

In logistic regression, the Log loss function is used.

**Log Loss function:**

Mathematically the log loss function is the average of the negative average of the log of corrected predicted probabilities for each instance.

By default, logistic regression gives probabilities with respect to the hypothesis.

For example, the hypothesis is “Probability that a person sleeps more than 10 hours a day.”

Here 1 represents a person sleeping more than 10 hours a day, and 0 is less than 10 hours.

ID |
Class |
Probability |

1 | 1 | 0.93 |

2 | 1 | 0.76 |

3 | 0 | 0.2 |

4 | 0 | 0.4 |

5 | 1 | 0.78 |

Probability refers to the probability of the class being 1, i.e., the probability the person sleeps more than 10 hours a day.

In the case of ID 3 and 4, the probability is 0.2 and 0.4, respectively these need to be changed to refer to the probability that they belong to their class. Here corrective probabilities are used i.e. in a place where the class 0 Corrected probability= (1- actual probability)

ID |
Class |
Probability |
Corrected Probability |

1 | 1 | 0.93 | 0,93 |

2 | 1 | 0.76 | 0.76 |

3 | 0 | 0.2 | 0.8 |

4 | 0 | 0.4 | 0.6 |

5 | 1 | 0.78 | 0.12 |

Now it’s time to find the Log of the correct probabilities:

ID |
Class |
Probability |
Corrected Probability |
Log |

1 | 1 | 0.93 | 0.93 | -0.0315 |

2 | 1 | 0.76 | 0.76 | -0.1192 |

3 | 0 | 0.2 | 0.8 | -0.0969 |

4 | 0 | 0.4 | 0.6 | -0.2218 |

5 | 1 | 0.78 | 0.78 | -0.1079 |

Since the log for numbers, less than 1 is negative to deal with; this negative average is taken.

Thus the final formula becomes:

To summarise, the steps for the log loss function are:

- Find corrected probability
- Take the log of corrected probabilities
- Convert to the negative average of the values

**The following code is for Logistic regression using Sklearn Library.**

**Loan Status Prediction:**

As usual, let’s start with importing the libraries and then reading the dataset:

import numpy as np import pandas as pd from sklearn.model_selection import train_test_split from sklearn.linear_model import LogisticRegression from sklearn.metrics import accuracy_score |

loan_dataset= pd.read_csv(‘../input/loan-predication/train_u6lujuX_CVtuZ9i (1).csv’) |

This dataset requires some preprocessing because it contains words instead of numbers and some null values.

# dropping the missing values loan_dataset = loan_dataset.dropna() # numbering the labels loan_dataset.replace({“Loan_Status”:{‘N’:0,’Y’:1}},inplace=True) # replacing the value of 3+ to 4 loan_dataset = loan_dataset.replace(to_replace=’3+’, value=4) # convert categorical columns to numerical values loan_dataset.replace({‘Married’:{‘No’:0,’Yes’:1},’Gender’:{‘Male’:1,’Female’:0},’Self_Employed’:{‘No’:0,’Yes’:1}, ‘Property_Area’:{‘Rural’:0,’Semiurban’:1,’Urban’:2},’Education’:{‘Graduate’:1,’Not Graduate’:0}},inplace=True) |

In the first line, NULL values were dropped, then the Y and N (representing Yes and No ) in the dataset were replaced by 0 and 1; similarly, other categorical values are also given numbers.

# separating the data and label X = loan_dataset.drop(columns=[‘Loan_ID’,’Loan_Status’],axis=1) Y = loan_dataset[‘Loan_Status’] X_train, X_test,Y_train,Y_test = train_test_split(X,Y,test_size=0.25,random_state=2) |

The code shows that the dataset is split into training and testing datasets.

The ** test_size=0.25 **shows that 75% of the dataset will be used for training while 25% for testing.

model = LogisticRegression() model.fit(X_train, Y_train) |

Sklearn makes it very easy to train the model. Only 1 line of code is required to do so.

Evaluating the model:

X_train_prediction = model.predict(X_train) training_data_accuracy = accuracy_score(X_train_prediction, Y_train) |

This shows that the accuracy of the training data is 82%

X_test_prediction = model.predict(X_test) test_data_accuracy = accuracy_score(X_test_prediction, Y_test) |

This shows that the accuracy of the training data is 78.3%

input_data= (1,1,0,1,0,3033,1459.0,94.0,360.0,1.0,2)
# changing the input_data to a numpy array # reshape the np array as we are predicting for one instance prediction = model.predict(input_data_reshaped) |

Here when random input data is given to the trained model, it gives us the output of whether the loan is approved.

Logistic Regression’s code without Sklearn behaves like a neural network, requiring forward and backward propagation of the Loss function to set the weights and biases so that an optimized result is found.