Logistic regression is one of the most popular algorithms for classification problems. It is called regression even though it is not a regression algorithm because the underlying technology is similar to Linear Regression. The term “logistic” comes from the statistical model used (logit model).
As seen in earlier releases, classification algorithms are used to classify the dataset into various classes; based on that, logistic classification is a type of binary classification.
Logistic regression is an extension of the linear regression model. Even though linear regression is good for regression, it does not work well for classification because the linear model doesn’t output probabilities and treats them as either 0 or 1 (Class A or Class B). Using this, it fits the dataset in a plane with each row as its point, then attempts to find the line that minimizes the distances between points and the plan. Using the information given, the linear model will try to force a weird structure between the independent and the dependent variables.
The line L1 is the best fit line when only the red points are considered, and it classified points to the right of the line as 1 and left as 0, which provides a decent indication of the data for classification even though there are a lot of wrongly classified points.
Now, if there is one point that is an extreme case (also called outlier), the best fit line transforms to L2, which is now classifying more points incorrectly to the extent where all the points that should be classified as 1 are 0. This drastic difference came just because of a single outlier.
Seeing such conditions, modifications were made to the linear regression algorithm, creating the famous logistic regression.
Instead of using a straight line in the plane logistic regression model uses the logistic function to fit the output of a linear equation between 0 and 1. Looking at the above diagram, it is evident that the S-curve created by logistic regression relates closely to the data points.
When drawn on a 2-D plane, it looks like this:
In the places where X( independent variable) goes to infinity, Y (Dependent variable) goes to 1 and where X goes to negative of infinity, Y goes to 0.
This logistic function is also called a sigmoid function. It will take any real number and convert it into a probability between 0 and 1, hence great for binary classification.
Till now, there was only one independent variable (X) what will happen in a condition where there is more than one independent variable?
Then the linear equation will switch into
Here x1, x2,…. xp are all independent variables. The β values are calculated using the maximum likelihood estimation. This method checks the values of β through multiple iterations and finds the best fit of log odds, producing a likelihood function. Logistic regression works when this function is maximized, and the optimal values for the coefficients are found and then used in the sigmoid function to find the probability.
The blue line here is y=0.5, above which all points will be classified as class 1, and below it are class 0.
The probabilities found by the aforementioned formula are used here.
So, the good thing about the logistic regression algorithm is that it not only classifies but also provides the probabilities. Knowing that a condition has a 90+% probability for a class compared to one with 51% is a big advantage.
The cost function of Logistic regression:
The cost function is a function that helps us understand how well the machine learning model works. It in itself calculates the difference between the actual and the predicted values and measures how wrong the algorithm was in prediction. By minimizing the value of the cost function most optimized result is found.
In logistic regression, the Log loss function is used.
Log Loss function:
Mathematically the log loss function is the average of the negative average of the log of corrected predicted probabilities for each instance.
By default, logistic regression gives probabilities with respect to the hypothesis.
For example, the hypothesis is “Probability that a person sleeps more than 10 hours a day.”
Here 1 represents a person sleeping more than 10 hours a day, and 0 is less than 10 hours.
Probability refers to the probability of the class being 1, i.e., the probability the person sleeps more than 10 hours a day.
In the case of ID 3 and 4, the probability is 0.2 and 0.4, respectively these need to be changed to refer to the probability that they belong to their class. Here corrective probabilities are used i.e. in a place where the class 0 Corrected probability= (1- actual probability)
Now it’s time to find the Log of the correct probabilities:
Since the log for numbers, less than 1 is negative to deal with; this negative average is taken.
Thus the final formula becomes:
To summarise, the steps for the log loss function are:
- Find corrected probability
- Take the log of corrected probabilities
- Convert to the negative average of the values
The following code is for Logistic regression using Sklearn Library.
Loan Status Prediction:
As usual, let’s start with importing the libraries and then reading the dataset:
|import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
|loan_dataset= pd.read_csv(‘../input/loan-predication/train_u6lujuX_CVtuZ9i (1).csv’)|
This dataset requires some preprocessing because it contains words instead of numbers and some null values.
|# dropping the missing values
loan_dataset = loan_dataset.dropna()
# numbering the labels
# replacing the value of 3+ to 4
loan_dataset = loan_dataset.replace(to_replace=’3+’, value=4)
# convert categorical columns to numerical values
In the first line, NULL values were dropped, then the Y and N (representing Yes and No ) in the dataset were replaced by 0 and 1; similarly, other categorical values are also given numbers.
|# separating the data and label
X = loan_dataset.drop(columns=[‘Loan_ID’,’Loan_Status’],axis=1)
Y = loan_dataset[‘Loan_Status’]
X_train, X_test,Y_train,Y_test = train_test_split(X,Y,test_size=0.25,random_state=2)
The code shows that the dataset is split into training and testing datasets.
The test_size=0.25 shows that 75% of the dataset will be used for training while 25% for testing.
|model = LogisticRegression()
Sklearn makes it very easy to train the model. Only 1 line of code is required to do so.
Evaluating the model:
|X_train_prediction = model.predict(X_train)
training_data_accuracy = accuracy_score(X_train_prediction, Y_train)
This shows that the accuracy of the training data is 82%
|X_test_prediction = model.predict(X_test)
test_data_accuracy = accuracy_score(X_test_prediction, Y_test)
This shows that the accuracy of the training data is 78.3%
# changing the input_data to a numpy array
# reshape the np array as we are predicting for one instance
prediction = model.predict(input_data_reshaped)
Here when random input data is given to the trained model, it gives us the output of whether the loan is approved.
Logistic Regression’s code without Sklearn behaves like a neural network, requiring forward and backward propagation of the Loss function to set the weights and biases so that an optimized result is found.
Logistic Regression is a widely used machine learning algorithm for binary classification problems. It is particularly effective when the outcome variable is categorical and has two classes. In this article, we will explore various aspects of Logistic Regression, including the sigmoid function, cost function, gradient descent, training and optimization, evaluation metrics, regularization techniques, handling imbalanced data, and extending it to multiclass classification scenarios.
I. Sigmoid Function: Activation for Binary Classification
The sigmoid function, also known as the logistic function, is a key component of Logistic Regression. It maps any real-valued number to a value between 0 and 1, making it suitable for binary classification. The sigmoid function is defined as:
sigmoid(z) = 1 / (1 + e^(-z))
Here, ‘z’ represents the linear combination of input features and their respective weights. The sigmoid function transforms the linear output into a probability value, indicating the likelihood of belonging to the positive class.
II. Cost Function and Gradient Descent in Logistic Regression
The cost function in Logistic Regression measures the discrepancy between the predicted probabilities and the actual labels. The most common cost function is the log-loss or binary cross-entropy function. It is defined as:
cost(h(x), y) = -y * log(h(x)) - (1 - y) * log(1 - h(x))
Here, ‘h(x)’ represents the predicted probability, and ‘y’ represents the actual label. The goal is to minimize this cost function.
Gradient descent is an optimization algorithm used to find the optimal set of weights that minimizes the cost function. It iteratively adjusts the weights in the opposite direction of the gradient of the cost function until convergence is reached.
III. Training and Optimization of Logistic Regression Model
To train a Logistic Regression model, we initialize the weights with random values. We then use gradient descent to update the weights iteratively, minimizing the cost function. The learning rate, which determines the step size during each iteration, plays a crucial role in the convergence of the algorithm.
IV. Evaluation Metrics for Logistic Regression
To evaluate the performance of a Logistic Regression model, various metrics can be used. Common evaluation metrics include accuracy, precision, recall, F1-score, and area under the ROC curve (AUC-ROC). These metrics provide insights into the model’s performance in terms of correctly predicting positive and negative classes.
V. Regularization Techniques in Logistic Regression
Regularization techniques, such as L1 and L2 regularization, help prevent overfitting in Logistic Regression models. These techniques add a regularization term to the cost function, which penalizes large weights. By controlling the regularization parameter, we can adjust the trade-off between model complexity and fitting the training data.
VI. Handling Imbalanced Data in Logistic Regression
Imbalanced datasets, where the number of samples in one class is significantly higher or lower than the other, can lead to biased models. Various techniques, including oversampling, undersampling, and synthetic minority oversampling technique (SMOTE), can be employed to address class imbalance and improve the performance of Logistic Regression on imbalanced datasets.
VII. Multiclass Logistic Regression
While Logistic Regression is primarily used for binary classification, it can be extended to handle multiclass classification problems using one-vs-rest or multinomial logistic regression. These approaches allow the model to handle more than two classes by training multiple binary logistic regression models or a single multiclass logistic regression model.
In conclusion, Logistic Regression is a powerful and interpretable algorithm for binary classification. Understanding the sigmoid function, cost function, gradient descent, training, evaluation metrics, regularization techniques, handling imbalanced data, and multiclass extensions provides a comprehensive understanding of this versatile algorithm. By mastering these concepts, you can effectively apply Logistic Regression to a wide range of real-world problems.