Recurrent neural network and its work

Image

Recurrent neural network and its work

Recurrent neural network (RNN) is a type of neural network where the output from previous step is fed as input to the current step. In traditional neural networks, all the inputs and outputs are independent of each other, but in some cases when it is required to predict the next word of a sentence, the previous words are necessary; hence, there is a need to recognize the previous words. Thus RNN came into existence, which has solved issue with the use of a hidden layer. The main and most important feature of RNN is hidden state, which dwell upon some information about a sequence.

How recurrent neural network works:

The working of an RNN can be described with the help of the following example:

Example:

Suppose a deeper network consists of one input layer, three hidden layers, and one output layer. Then not like other neural networks, each hidden layer will have its own set of weights and their biases. The value for hidden layer is 1; then the weights and biases are w1 and b1, w2 and b2 for second hidden layer, and w3 and b3 for third hidden layer. This means that each of these layers is independent of each other, i.e., they do not memorize any other previous outputs.

Training

Gradient descent:

Gradient descent is a first-order iterative optimization algorithm for finding the minimum of a function. In neural networks, it can be used to minimize the error term by changing each weight in proportion to the derivative of the error with respect to that weight, provided the non-linear activation functions are differentiable. The standard method is called “backpropagation through time” or BPTT, and is a generalization of back-propagation for feed-forward networks. Like that method, it is an instance of automatic differentiation in the reverse accumulation mode of Pontryagin's minimum principle. A more computationally expensive online variant is called “Real-Time Recurrent Learning” or RTRL, which is an instance of automatic differentiation in the forward accumulation mode with stacked tangent vectors. Unlike BPTT, this algorithm is local in time but not local in space.

Global optimization methods:

Training the weights in a neural network can be modeled as a non-linear global optimization problem. A target function can be formed to evaluate the fitness or error of a particular weight vector as follows: First, the weights in the network are set according to the weight vector. Next, the network is evaluated against the training sequence. Typically, the sum-squared-difference between the predictions and the target values specified in the training sequence is used to represent the error of the current weight vector. Arbitrary global optimization techniques may then be used to minimize this target function.

The most common global optimization method for training RNNs is genetic algorithms, especially in unstructured networks.

Initially, the genetic algorithm is encoded with the neural network weights in a predefined manner where one gene in the chromosome represents one weight link. The whole network is represented as a single chromosome. The fitness function is evaluated as follows:

  • Each weight encoded in the chromosome is assigned to the respective weight link of the network.
  • The training set is presented to the network which propagates the input signals forward.
  • The mean-squared-error is returned to the fitness function.