In machine, learning classification is the method of classifying data into categories using certain input variables. Dataset with labels given (training dataset) is used to train the model in a way that the model can provide labels for datasets that are not yet labelled.
Under classification there are 2 types of classifiers:
- Binary Classification
- Multi-Class Classification
Here let’s discuss Multiple Class classification in detail.
A classification problem where there are more than 2 classes amongst which the dataset is to be classified for example: In the healthcare industry, it can be used to decide which disease a person has depending on the type of symptoms shown.
On websites like e-commerce websites like Amazon, the products are classified under various categories ( Electronics, Furniture etc). Such classifications are not under the scope of human labour and can only be done using ML models.
Another use is in image classification, the most famous example for beginners is the recognition of a digit from the handwritten image dataset (MNIST Dataset). Here the images are classified to a number from 0 to 9.
The most used Multiclass classification algorithms are:
- Naive Bayes
- Random Forest
- KNN
- Neural networks
Imbalanced Dataset: A common problem
In an ideal condition all the classes/ categories are equally represented in the dataset however in the real world scenario this is never the case. This is often due to the unavailability of data and causes the dataset to be more biased towards one or more of the classes.
Let’s take the case of detecting the disease of a patient from symptoms. Dataset taken from a hospital might contain information on common diseases however there might be a lack of representation of the rarer diseases.
Such a case leads to the training of a biased model that causes miscalculations for minority classes in the dataset even though it has a high accuracy for majority classes.
This problem can be solved by:
- Collecting more data from various sources.
- Using resampling techniques
- Undersampling: Removing samples from the majority class
- Oversampling: Adding more examples for the minority class
- Creating synthetic Data: Certain deep learning techniques (like GANs) can be used to generate fake data to reduce the imbalance.
Working on multi-class classifiers:
Binary Transform
As seen in the previous release there exist many ML models that work solely on binary classification so the Multiclass problem is broken into various binary classifications using one-vs-one or one-vs-rest strategies.
One Vs Rest:
In this technique, one class is selected (target class) and classified against all the classes i.e. target class vs remaining classes.
For example: In the case of product classification for a jewellery store’s e-commerce website where classes for products are Diamond, Gold and Silver.
- Classifier 1: Diamond vs [Gold, Silver]
- Classifier 2: Gold vs [Diamond, Silver]
- Classifier 3: Silver vs [Diamond, Gold]
The binary classifier predicting the target class with highest confidence is chosen and given as final output.
One Vs One:
In this technique, every pair of classes is classified against each other i.e. one class vs every other class individually.
This will lead to the creation of (N * (N-1))/2 classifiers where N is the total number of classes.
Taking the same example as above classifiers created are:
- Classifier 1: Diamond vs Gold
- Classifier 2: Gold vs Silver
- Classifier 3; Diamond vs Silver
After all, classifiers predict their respective classes the majority is taken as the final output.
This works best with SVM.
Don’t worry splitting into these classifiers and then getting the final output can easily be done with the help of Sklearn’s inbuilt functions. This also explains why multi-class algorithms are more time and memory consuming while training.
This seems like a tedious task which is simplified by the existence of Native multiclass algorithms.
Algorithms such as K-Nearest Neighbours, Naive Bayes and Decision trees are created in a way that they can take care of multiclass problems without requiring any extra strategies.
Naive Bayes works with the Bayes theorem and assumes all classes and features are independent of each other. It works really well with large datasets with categorical features.
Decision trees take the entire dataset in the root node and progressively split it into a tree-like structure until it reaches the leaf nodes providing the final output. This works with both categorical and continuous datasets and has a relatively low training time.
K-nearest neighbour is a classification technique that does not require any training and finds the ‘k’ nearest data points to the given unknown data point. The class which is present in the majority of these k points is taken as the output.
The latest way to tackle multi-class classification is using neural networks where the algorithm simulates the human brain and has small units called notes for the computation of the dataset.
These are certain ways by which multi-class classification is done. More in-depth explanations for each algorithm will be in future releases.