Stable Adaptive Momentum for Rapid Online Learning in Nonlinear Systems

T. Graepel and N. N. Schraudolph. Stable Adaptive Momentum for Rapid Online Learning in Nonlinear Systems. In Proc. Intl. Conf. Artificial Neural Networks (ICANN), pp. 450–455, Springer Verlag, Berlin, Madrid, Spain, 2002.

Download


236.7kB	73.7kB	186.6kB

Abstract

We consider the problem of developing rapid, stable, and scalable stochastic gradient descent algorithms for optimisation of very large nonlinear systems. Based on earlier work by Orr et al. on adaptive momentum---an efficient yet extremely unstable stochastic gradient descent algorithm---we develop a stabilised adaptive momentum algorithm that is suitable for noisy nonlinear optimisation problems. The stability is improved by introducing a forgetting factor that smoothes the trajectory and enables adaptation in non-stationary environments. The scalability of the new algorithm follows from the fact that at each iteration the multiplication by the curvature matrix can be achieved in O(n) steps using automatic differentiation tools. We illustrate the behaviour of the new algorithm on two examples: a linear neuron with squared loss and highly correlated inputs, and a multilayer perceptron applied to the four regions benchmark task.

BibTeX Entry

@inproceedings{GraSch02,
     author = {Thore Graepel and Nicol N. Schraudolph},
      title = {\href{http://nic.schraudolph.org/pubs/GraSch02.pdf}{
               Stable Adaptive Momentum for Rapid Online Learning
               in Nonlinear Systems}},
      pages = {450--455},
     editor = {Jos\'e R. Dorronsoro},
  booktitle =  icann,
    address = {Madrid, Spain},
     volume =  2415,
     series = {\href{http://www.springer.de/comp/lncs/}{
               Lecture Notes in Computer Science}},
  publisher = {\href{http://www.springer.de/}{Springer Verlag}, Berlin},
       year =  2002,
   b2h_type = {Top Conferences},
  b2h_topic = {Gradient Descent},
   abstract = {
    We consider the problem of developing rapid, stable, and scalable
    stochastic gradient descent algorithms for optimisation of very large
    nonlinear systems. Based on earlier work by Orr et al. on adaptive
    momentum\,---\,an efficient yet extremely unstable stochastic
    gradient descent algorithm\,---\,we develop a stabilised adaptive
    momentum algorithm that is suitable for noisy nonlinear optimisation
    problems. The stability is improved by introducing a forgetting
    factor that smoothes the trajectory and enables adaptation in
    non-stationary environments. The scalability of the new algorithm
    follows from the fact that at each iteration the multiplication by
    the curvature matrix can be achieved in O(n) steps using automatic
    differentiation tools. We illustrate the behaviour of the new
    algorithm on two examples: a linear neuron with squared loss and
    highly correlated inputs, and a multilayer perceptron applied to
    the four regions benchmark task.
}}