Online Learning with Adaptive Local Step Sizes

N. N. Schraudolph. Online Learning with Adaptive Local Step Sizes. In Neural Nets---WIRN Vietri-99: Proc. 11^th Italian Workshop on Neural Networks, pp. 151–156, Springer Verlag, Berlin, Vietri sul Mare, Salerno, Italy, 1999.

Download


159.7kB	62.4kB	110.8kB

Abstract

Almeida et al. have recently proposed online algorithms for local step size adaptation in nonlinear systems trained by gradient descent. Here we develop an alternative to their approach by extending Sutton's work on linear systems to the general, nonlinear case. The resulting algorithms are computationally little more expensive than other acceleration techniques, do not assume statistical independence between successive training patterns, and do not require an arbitrary smoothing parameter. In our benchmark experiments, they consistently outperform other acceleration methods as well as stochastic gradient descent with fixed learning rate and momentum.

BibTeX Entry

@inproceedings{Schraudolph99c,
     author = {Nicol N. Schraudolph},
      title = {\href{http://nic.schraudolph.org/pubs/Schraudolph99c.pdf}{
               Online Learning with Adaptive Local Step Sizes}},
      pages = {151--156},
     editor = {Maria Marinaro and Roberto Tagliaferri},
  booktitle = {Neural Nets\,---\,WIRN Vietri-99: Proc.\ 11$^{th}$
               Italian Workshop on Neural Networks},
     series = {Perspectives in Neural Computing},
    address = {Vietri sul Mare, Salerno, Italy},
  publisher = {\href{http://www.springer.de/}{Springer Verlag}, Berlin},
       year =  1999,
   b2h_type = {Other},
  b2h_topic = {>Stochastic Meta-Descent},
   abstract = {
    Almeida {\em et al.}\ have recently proposed {\em online}\/
    algorithms for local step size adaptation in nonlinear systems
    trained by gradient descent.  Here we develop an alternative to their
    approach by extending Sutton's work on linear systems to the general,
    nonlinear case.  The resulting algorithms are computationally little
    more expensive than other acceleration techniques, do not assume
    statistical independence between successive training patterns, and
    do not require an arbitrary smoothing parameter.  In our benchmark
    experiments, they consistently outperform other acceleration methods
    as well as stochastic gradient descent with fixed learning rate
    and momentum.
}}