In machine learning, classification is the method of classifying data using certain input variables. A dataset with labels given (training dataset) is used to train the model in a way that the model can provide labels for datasets that are not yet labeled.
Under classification, there are 2 types of classifiers:
- Binary Classification
- Multi-Class Classification
Here let’s discuss Multiple Class classification in detail.
A classification problem where there are more than 2 classes amongst which the dataset is to be classified, for example: In the healthcare industry, it can be used to decide which disease a person has depended on the type of symptoms shown.
On websites like e-commerce websites like Amazon, the products are classified under various categories ( Electronics, Furniture, etc.). Such classifications are not under the scope of human labor and can only be done using ML models.
Another use is in image classification; the most famous example for beginners is the recognition of a digit from the handwritten image dataset (MNIST Dataset). Here the images are classified into a number from 0 to 9.
The most used Multiclass classification algorithms are:
- Naive Bayes
- Random Forest
- Neural networks
Imbalanced Dataset: A common problem
In an ideal condition, all the classes/ categories are equally represented in the dataset however, in the real-world scenario, this is never the case. This is often due to the unavailability of data and causes the dataset to be more biased towards one or more classes.
Let’s take the case of detecting a patient’s disease from symptoms. Dataset taken from a hospital might contain information on common diseases however there might be a lack of representation of the rarer diseases.
Such a case leads to the training of a biased model that causes miscalculations for minority classes in the dataset even though it has a high accuracy for majority classes.
This problem can be solved by:
- Collecting more data from various sources.
- Using resampling techniques
- Undersampling: Removing samples from the majority class
- Oversampling: Adding more examples for the minority class
- Creating synthetic Data: Certain deep learning techniques (like GANs) can be used to generate fake data to reduce the imbalance.
Working on multi-class classifiers:
As seen in the previous release, many ML models work solely on binary classification, so the Multiclass problem is broken into various binary classifications using one-vs-one or one-vs-rest strategies.
One Vs. Rest:
In this technique, one class is selected (target class) and classified against all the classes i.e., target class vs. remaining classes.
For example: In the case of product classification for a jewelry store’s e-commerce website where classes for products are Diamond, Gold, and Silver.
- Classifier 1: Diamond vs. [Gold, Silver]
- Classifier 2: Gold vs. [Diamond, Silver]
- Classifier 3: Silver vs. [Diamond, Gold]
The binary classifier predicting the target class with the highest confidence is chosen and given as the final output.
One Vs. One:
In this technique, every pair of classes is classified against each other, i.e., one class vs. every other class individually.
This will lead to the creation of (N * (N-1))/2 classifiers where N is the total number of classes.
Taking the same example as above classifiers created are:
- Classifier 1: Diamond vs. Gold
- Classifier 2: Gold vs. Silver
- Classifier 3; Diamond vs. Silver
After all, classifiers predict their respective classes the majority is taken as the final output.
This works best with SVM.
Don’t worry splitting into these classifiers and then getting the final output can easily be done with the help of Sklearn’s inbuilt functions.
This also explains why multi-class algorithms are more time and memory-consuming while training.
This seems like a tedious task which is simplified by the existence of Native multiclass algorithms.
Algorithms such as K-Nearest Neighbours, Naive Bayes, and Decision trees are created in a way that they can take care of multiclass problems without requiring any extra strategies.
Naive Bayes works with the Bayes theorem and assumes all classes and features are independent. It works well with large datasets with categorical features.
Decision trees take the entire dataset in the root node and progressively split it into a tree-like structure until it reaches the leaf nodes providing the final output. This works with categorical and continuous datasets and has a relatively low training time.
Example of a Decision tree classifier
K-nearest neighbor is a classification technique that does not require training and finds the ‘k’ nearest data points to the given unknown data point. The class present in most of these k points is taken as the output.
The latest way to tackle multi-class classification is using neural networks where the algorithm simulates the human brain and has small units called notes for the computation of the dataset.
These are certain ways by which multi-class classification is done. More in-depth explanations for each algorithm will be in future releases.