Accelerated Gradient Descent by Factor-Centering Decomposition
N. N. Schraudolph. Accelerated Gradient Descent by Factor-Centering Decomposition. Technical Report IDSIA-33-98, Istituto Dalle Molle di Studi sull'Intelligenza Artificiale, 1998.
Download
187.7kB | 75.8kB | 101.9kB |
Abstract
Gradient factor centering is a new methodology for decomposing neural networks into biased and centered subnets which are then trained in parallel. The decomposition can be applied to any pattern-dependent factor in the network's gradient, and is designed such that the subnets are more amenable to optimization by gradient descent than the original network: biased subnets because of their simplified architecture, centered subnets due to a modified gradient that improves conditioning. The architectural and algorithmic modifications mandated by this approach include both familiar and novel elements, often in prescribed combinations. The framework suggests for instance that shortcut connections---a well-known architectural feature---should work best in conjunction with slope centering, a new technique described herein. Our benchmark experiments bear out this prediction, and show that factor-centering decomposition can speed up learning significantly without adversely affecting the trained network's generalization ability.
BibTeX Entry
@techreport{facede, author = {Nicol N. Schraudolph}, title = {\href{http://nic.schraudolph.org/pubs/facede.pdf}{ Accelerated Gradient Descent by Factor-Centering Decomposition}}, number = {IDSIA-33-98}, institution = {Istituto Dalle Molle di Studi sull'Intelligenza Artificiale}, address = {Galleria 2, CH-6928 Manno, Switzerland}, year = 1998, b2h_type = {Other}, b2h_topic = {>Preconditioning}, abstract = { {\em Gradient factor centering}\/ is a new methodology for decomposing neural networks into {\em biased}\/ and {\em centered}\/ subnets which are then trained in parallel. The decomposition can be applied to any pattern-dependent factor in the network's gradient, and is designed such that the subnets are more amenable to optimization by gradient descent than the original network: biased subnets because of their simplified architecture, centered subnets due to a modified gradient that improves conditioning. The architectural and algorithmic modifications mandated by this approach include both familiar and novel elements, often in prescribed combinations. The framework suggests for instance that {\em shortcut connections}\/---\,a well-known architectural feature\,---\,should work best in conjunction with \href{b2hd-slope}{\em slope centering}, a new technique described herein. Our benchmark experiments bear out this prediction, and show that factor-centering decomposition can speed up learning significantly without adversely affecting the trained network's generalization ability. }}