Online Learning with Adaptive Local Step Sizes
N. N. Schraudolph. Online Learning with Adaptive Local Step Sizes. In Neural Nets---WIRN Vietri-99: Proc. 11th Italian Workshop on Neural Networks, pp. 151–156, Springer Verlag, Berlin, Vietri sul Mare, Salerno, Italy, 1999.
Download
159.7kB | 62.4kB | 110.8kB |
Abstract
Almeida et al. have recently proposed online algorithms for local step size adaptation in nonlinear systems trained by gradient descent. Here we develop an alternative to their approach by extending Sutton's work on linear systems to the general, nonlinear case. The resulting algorithms are computationally little more expensive than other acceleration techniques, do not assume statistical independence between successive training patterns, and do not require an arbitrary smoothing parameter. In our benchmark experiments, they consistently outperform other acceleration methods as well as stochastic gradient descent with fixed learning rate and momentum.
BibTeX Entry
@inproceedings{Schraudolph99c, author = {Nicol N. Schraudolph}, title = {\href{http://nic.schraudolph.org/pubs/Schraudolph99c.pdf}{ Online Learning with Adaptive Local Step Sizes}}, pages = {151--156}, editor = {Maria Marinaro and Roberto Tagliaferri}, booktitle = {Neural Nets\,---\,WIRN Vietri-99: Proc.\ 11$^{th}$ Italian Workshop on Neural Networks}, series = {Perspectives in Neural Computing}, address = {Vietri sul Mare, Salerno, Italy}, publisher = {\href{http://www.springer.de/}{Springer Verlag}, Berlin}, year = 1999, b2h_type = {Other}, b2h_topic = {>Stochastic Meta-Descent}, abstract = { Almeida {\em et al.}\ have recently proposed {\em online}\/ algorithms for local step size adaptation in nonlinear systems trained by gradient descent. Here we develop an alternative to their approach by extending Sutton's work on linear systems to the general, nonlinear case. The resulting algorithms are computationally little more expensive than other acceleration techniques, do not assume statistical independence between successive training patterns, and do not require an arbitrary smoothing parameter. In our benchmark experiments, they consistently outperform other acceleration methods as well as stochastic gradient descent with fixed learning rate and momentum. }}