Online Learning with Adaptive Local Step Sizes

N. N. Schraudolph. Online Learning with Adaptive Local Step Sizes. In Neural Nets---WIRN Vietri-99: Proc. 11th Italian Workshop on Neural Networks, pp. 151–156, Springer Verlag, Berlin, Vietri sul Mare, Salerno, Italy, 1999.

Download

pdf djvu ps.gz
159.7kB   62.4kB   110.8kB  

Abstract

Almeida et al. have recently proposed online algorithms for local step size adaptation in nonlinear systems trained by gradient descent. Here we develop an alternative to their approach by extending Sutton's work on linear systems to the general, nonlinear case. The resulting algorithms are computationally little more expensive than other acceleration techniques, do not assume statistical independence between successive training patterns, and do not require an arbitrary smoothing parameter. In our benchmark experiments, they consistently outperform other acceleration methods as well as stochastic gradient descent with fixed learning rate and momentum.

BibTeX Entry

@inproceedings{Schraudolph99c,
     author = {Nicol N. Schraudolph},
      title = {\href{http://nic.schraudolph.org/pubs/Schraudolph99c.pdf}{
               Online Learning with Adaptive Local Step Sizes}},
      pages = {151--156},
     editor = {Maria Marinaro and Roberto Tagliaferri},
  booktitle = {Neural Nets\,---\,WIRN Vietri-99: Proc.\ 11$^{th}$
               Italian Workshop on Neural Networks},
     series = {Perspectives in Neural Computing},
    address = {Vietri sul Mare, Salerno, Italy},
  publisher = {\href{http://www.springer.de/}{Springer Verlag}, Berlin},
       year =  1999,
   b2h_type = {Other},
  b2h_topic = {>Stochastic Meta-Descent},
   abstract = {
    Almeida {\em et al.}\ have recently proposed {\em online}\/
    algorithms for local step size adaptation in nonlinear systems
    trained by gradient descent.  Here we develop an alternative to their
    approach by extending Sutton's work on linear systems to the general,
    nonlinear case.  The resulting algorithms are computationally little
    more expensive than other acceleration techniques, do not assume
    statistical independence between successive training patterns, and
    do not require an arbitrary smoothing parameter.  In our benchmark
    experiments, they consistently outperform other acceleration methods
    as well as stochastic gradient descent with fixed learning rate
    and momentum.
}}

Generated by bib2html.pl (written by Patrick Riley) on Thu Sep 25, 2014 12:00:33