What is an activation function?

Activation Function

Activation functions get their names from you using it in neural networks as they decide whether a particular neuron should activate or not. In the context of this release, we will discuss the sigmoid function in detail as it is used in the logistic regression algorithm for binary classification.

Sigmoid Activation function

Sigmoid function’s main use is to limit the input values and get them between 0 and 1.

The formula for the sigmoid function is:

This function looks like an “S” when plotted on the 2-D graph.

Logistic regression is fundamentally a modification of linear regression. To adapt the linear equation for classification purposes, the sigmoid function is employed. The final classification is determined based on the value of σ (sigmoid).

Here y is the output of logistic regression.

This is a continuously increasing function which is differentiable everywhere.

The differentiability is useful in the case of neural networks because during back propagation while using gradient descent to calculate weights of the neural network derivative of the activation function is taken.

(Some terms might not be clear right now but will be understood after reading the neural networks release)

Unfortunately, this is computationally expensive and not zero-centred so avoided in neural networks and essentially used in binary classification.

Python Implementation of the sigmoid function:

def sigmoid(z):

y_head = 1 / (1+np.exp(-z))

return y_head

This activation function is also known as a squashing function because it can compress large values into the range (0, 1).

The sigmoid function holds great significance in the realm of neural networks. If you employe only linear functions, the model would learn linearly. By introducing a hidden sigmoid layer, the model can effectively handle non-linear problems.

As previously noted, sigmoid is suitable for binary classification tasks, but for multi-class classification scenarios, a modified version called softmax is utilized.

Softmax:

This is a function that uses multiple sigmoids in one function. Mathematically it looks like:

Applying softmax gives the probability for the datapoint belonging to each class and its sum is always one.

Python Implementation for softmax function:

def softmax_function(x):
z = np.exp(x)
z_ = z/z.sum()
return z_

Tanh:

Tanh is also an activation function. This function is a modification of the sigmoid by making it symmetric around the origin.

The formula for Tanh is:

tanh(z)=2sigmoid(2z)-1

Or

tanh(x) = 2/(1+e^(-2x)) -1

If you observe, both the sigmoid and tanh functions exhibit an “S” shape in their graphs. However, tanh is centered around the origin and has a range between -1 and 1, in contrast to the sigmoid function which has a range between 0 and 1.

Python implementation for tanh:

def tanh(x):
z = (2/(1 + np.exp(-2*x))) -1
return z

Swish:

Researchers at Google developed Swish as a modification of the sigmoid function while seeking a computationally efficient alternative.

Formula for swish is:

f(x) = x*sigmoid(x)

or

f(x) = x/(1-e^-x)

Python implementation for swish:

def swish_function(x):
return x/(1-np.exp(-x))

This function creates to outperform ReLU (another activation function). This is the version of sigmoid where values vary from negative infinity to infinity.

In conclusion, you can use the sigmoid function and its variations for classification problems.

Leave a Reply

Your email address will not be published. Required fields are marked *