Posts

Showing posts from September, 2020

Mathematics of Back Propagation for Deep Learning simplified

Image
Back Propagation Algorithm a short history Although without reference to neural networks back propagation method was perhaps first described in a 1970 master's thesis by Seppo Linnainmaa. In 1973 Stuart E. Dreyfus in published a simpler derivative based method to minimise cost function by adapting parameters of controllers in proportion to error gradients.  The first neural network specific application of BPA was proposed by Paul Werbos in 1982. This was later refined by Rumelhart, McClelland and Hinton in 1986 and made a radical change in solving supervised learning problems. Their research paper experimentally demonstated the usefulness of BPA for internal representations in the hidden layers and easy addressing of its synapses of  Multi Layered Feed Forward Neural Networks ( MLFFNN ) . This paper significantly contributed to the popularisation of BPA. Description of BPA BP uses ordered derivatives (as termed by Werbos - The Roots of Backpropagation: From Ordered Derivatives to N

Gradient Descent rule and Widrow Hoff rule for Deep Learning

Image
(PLEASE READ THE PREVIOUS POST ON ARTIFICIAL NEURAL NETWORKS AND ITS LEARNING ALGORITHMS) Adapting the weight of a linear neuron (Adaptive Filtering Problem) A  Perceptron network single neuron unit with threshold logic unit activation has a discrete two valued outputs,  each one to represent a distinct class. The main objective of Perceptron network training is to find a feasible w * that separates the two classes. The learning algorithm progressively reduces the distance between the updated weight vector to the set of feasible weight vector solutions. However a single neuron operating with a linear activation function can perform like an adaptive filter.  Figure-1 Adaptive Filtering The Least Mean Squared (LMS) also known as the Delta rule or the Widrow Hoff rule is the basis of implementing a linear adaptive filtering. The aim is to minimize the squared errors over all the training data samples and therefore is a convex optimization problem. The LMS algorithm ensures that