Combining Conjugate Direction Methods with Stochastic Approximation of Gradients

N. N. Schraudolph and T. Graepel. Combining Conjugate Direction Methods with Stochastic Approximation of Gradients. In Proc. 9th Intl. Workshop Artificial Intelligence and Statistics (AIstats), pp. 7–13, Society for Artificial Intelligence and Statistics, Key West, Florida, 2003.
Earlier version     Related paper

Download

pdf djvu ps.gz
230.4kB   100.7kB   166.8kB  

Abstract

The method of conjugate directions provides a very effective way to optimize large, deterministic systems by gradient descent. In its standard form, however, it is not amenable to stochastic approximation of the gradient. Here we explore ideas from conjugate gradient in the stochastic (online) setting, using fast Hessian-gradient products to set up low-dimensional Krylov subspaces within individual mini-batches. In our benchmark experiments the resulting online learning algorithms converge orders of magnitude faster than ordinary stochastic gradient descent.

BibTeX Entry

@inproceedings{SchGra03,
     author = {Nicol N. Schraudolph and Thore Graepel},
      title = {\href{http://nic.schraudolph.org/pubs/SchGra03.pdf}{
               Combining Conjugate Direction Methods
               with Stochastic Approximation of Gradients}},
      pages = {7--13},
     editor = {Christopher M. Bishop and Brendan J. Frey},
  booktitle = {Proc.\ 9$^{th}$ Intl.\ Workshop
               Artificial Intelligence and Statistics (AIstats)},
    address = {Key West, Florida},
  publisher = {Society for Artificial Intelligence and Statistics},
       isbn = {0-9727358-0-1},
       year =  2003,
   b2h_type = {Top Conferences},
  b2h_topic = {Gradient Descent},
   b2h_note = {<a href="b2hd-SchGra02.html">Earlier version</a> &nbsp;&nbsp;&nbsp; <a href="b2hd-SchGra02b.html">Related paper</a>},
   abstract = {
    The method of conjugate directions provides a very effective way to
    optimize large, deterministic systems by gradient descent.  In its
    standard form, however, it is not amenable to stochastic approximation
    of the gradient.  Here we explore ideas from conjugate gradient in the
    stochastic (online) setting, using fast Hessian-gradient products to set
    up low-dimensional Krylov subspaces within individual mini-batches.  In
    our benchmark experiments the resulting online learning algorithms
    converge orders of magnitude faster than ordinary stochastic gradient
    descent.
}}

Generated by bib2html.pl (written by Patrick Riley) on Thu Sep 25, 2014 12:00:33