# Training of a Neural Network The neural network algorithm is derived from the functioning of the human brain. It is composed of multiple neurons similar to the brain. Each neuron is made up of input, weight, activation function and output.

In every neuron, the weight acts as a guiding path for the algorithm that is over the algorithm the weight is constantly tweaked to improve the output. It varies as per the relative importance of different features.

So if we consider the output of a particular neuron we may get it as

Where represent the weight corresponding to the particular feature

denote the numerical input of every neuron and  is the bias in each

is the non-linear function called the activation function. Its purpose is to introduce non-linearity. It also serves the purpose to map the output between 0 to 1.

In simpler words, the function’s sole purpose is to map the non-linear output to values between 0 and 1.

## The training phase of ANN (Artificial neural network)

At the end of the day, we want our model to generate the best possible results. It achieves this by creating a probability-weighted association between inputs and outputs i.e. it takes an input and generates an output. It then compares the output with the expected result and calculates the error in the generated output.

### Often, we use Mean Squared Error or MSE

Where denotes the total number of inputs is the expected output and is the neural network output.

The algorithm repeatedly calculates the error using a cost function and tries to minimize the cost function.

A typical neural network consists of input layers, hidden layers and an output layer.

So in a typical ANN, the input layer is just the data points, used to train the algorithm. The succeeding layers are called hidden layers. The final layer is the output layer. In the output layer, there is a node for each corresponding class of output. So when the algorithm proceeds forward it keeps on assigning value to each new node which acts as input to the next layer until it all converges into one of the classes in the output layer.

## The learning process of ANN

ANN uses an iterative process to improve the output result. To achieve this it takes an input generating corresponding output. Then the error is calculated and the next data is processed along with the error which helps the algorithm to readjust the weights given to a variety of input features. To initiate the algorithm initially random weights are assigned to various features. In the iterative process, the algorithm reprocesses the same input once again, so hopefully, next time around the cost would be minimized and the output would be closest to that expected. Often some networks could never optimize the result even close to the expected output. It is not because the algorithm was wrongly designed, many times it is also due to the lack of sufficient data. Preferably there should be sufficient data for both training and validation purposes.

It helps in better understanding noisy data as well as being able to classify patterns on which they have not yet been trained. A commonly known ANN algorithm is Backpropagation.

## Understanding ANN using Perceptron Algorithm

The Perceptron algorithm is considered one of the simplest algorithms in terms of ANN. It uses linear separation if there are 2 features to predict the output and corresponding hyperplane in higher dimensions.

It uses stochastic gradient descent. The inputs which are provided to the model are multiplied initially with the random weights. This weighted sum of the model is then passed through a function which maps this output between 0 and 1.

The data points from the dataset are inputted to the model one at a time, it generates the output and then compares against the expected output. This process is known as Feed-forward. Then corresponding errors are calculated i.e. the result given by the model is compared with the expected or the original output this value is then propagated through the network which causes the corresponding updates in the weights or the linear coefficients of the model in the case of the perceptron algorithm. This is known as Back-Propagation. In other words, Back-Propagation is a technique for training the weights of a multilayer feed-forward neural network. As a result, a network topology of one or more levels is required, with each layer fully linked to the next. One input layer, one hidden layer, and one output layer constitute a general neural network. The process of updating the weights using back propagation in the case of the perceptron algorithm is called the perceptron update rule. When this update rule is applied once for every data point in the training set it is called an epoch.

The starting value of the weights of the network are randomly chosen, to further accompany it the data is shuffled and split randomly into training and testing data. This is done so as to test the effectiveness of the algorithm, as its learning dataset is changed every time it produces a different classification rate. The performance of the algorithm must be evaluated using repeated classification and averaging all the classification accuracies.