What is an activation function?
Activation functions get their names from being used in neural networks as they decide whether a particular neuron should be activated. In the context of this release, the sigmoid function will be discussed in detail as it is used in the logistic regression algorithm for binary classification.
Sigmoid Activation function
The sigmoid function’s main use is to limit the input values and get them between 0 and 1.
The formula for the sigmoid function is:
This function looks like an “S” when plotted on the 2-D graph.
Logistic regression inherently is a modification of linear regression, and the sigmoid function is used to convert the linear equation viable for classification. The final classification is made based on the value of σ (sigmoid).
Here y is the output of logistic regression.
This is a continuously increasing function that is differentiable everywhere.
The differentiability is useful in the case of neural networks because during backpropagation, using gradient descent to calculate weights of the neural network derivative of the activation function is taken.
(Some terms might not be clear right now but will be understood after reading the neural networks release)
Unfortunately, this is computationally expensive and not zero-centered so avoided in neural networks and essentially used in binary classification.
Python Implementation of the sigmoid function:
y_head = 1 / (1+np.exp(-z))
This activation function is also called a squashing function because it can take very large values and squash them in the range (0,1).
The sigmoid function is very important in the world of neural networks because if only linear functions were used, then the model would learn linearly. By adding a hidden sigmoid layer, it can work with non-linear problems.
As aforementioned, sigmoid works with only binary classification problems a modified version called softmax is used for multi-class classification situations.
This is a function that uses multiple sigmoids in one function. Mathematically it looks like:
Applying softmax gives the probability for the datapoint belonging to each class and its sum is always one.
Python Implementation for softmax function:
z = np.exp(x)
z_ = z/z.sum()
Tanh is also an activation function. This function modifies the sigmoid by making it symmetric around the origin.
The formula for Tanh is:
tanh(x) = 2/(1+e^(-2x)) -1
If you notice, the graph for both sigmoid and tanh is an “S” shape, but tanh is centered around the origin and keeps its range between -1 and 1, unlike sigmoid’s 0 to 1 range.
Python implementation for tanh:
z = (2/(1 + np.exp(-2*x))) -1
Swish is another sigmoid modification developed by researchers at Google while looking for a computationally efficient function.
Formula for swish is:
f(x) = x*sigmoid(x)
f(x) = x/(1-e^-x)
Python implementation for swish:
This function was created to outperform ReLU (another activation function). This is the version of sigmoid, where values vary from negative infinity to infinity.
In conclusion, the sigmoid function and its variations are mostly used for classification problems.