Pearlmutter, Barak A. (1991) Gradient Descent: Second-Order Momentum and Saturating Error. Advances in Neural Information Processing Systems. pp. 887-894. ISSN 1049-5258
Preview
BP_gradient.pdf
Download (1MB) | Preview
Abstract
Batch gradient descent, ~w(t) = -7JdE/dw(t) , conver~es to a minimum
of quadratic form with a time constant no better than '4Amax/ Amin where
Amin and Amax are the minimum and maximum eigenvalues of the Hessian
matrix of E with respect to w. It was recently shown that adding a
momentum term ~w(t) = -7JdE/dw(t) + Q'~w(t - 1) improves this to
~ VAmax/ Amin, although only in the batch case. Here we show that secondorder
momentum, ~w(t) = -7JdE/dw(t) + Q'~w(t -1) + (3~w(t - 2), can
lower this no further. We then regard gradient descent with momentum
as a dynamic system and explore a non quadratic error surface, showing
that saturation of the error accounts for a variety of effects observed in
simulations and justifies some popular heuristics.
Item Type: | Article |
---|---|
Keywords: | Gradient Descent; Second-Order Momentum; Saturating Error; |
Academic Unit: | Faculty of Science and Engineering > Computer Science Faculty of Science and Engineering > Research Institutes > Hamilton Institute |
Item ID: | 5539 |
Depositing User: | Barak Pearlmutter |
Date Deposited: | 04 Nov 2014 14:44 |
Journal or Publication Title: | Advances in Neural Information Processing Systems |
Publisher: | Massachusetts Institute of Technology Press (MIT Press) |
Refereed: | Yes |
Related URLs: | |
URI: | https://mural.maynoothuniversity.ie/id/eprint/5539 |
Use Licence: | This item is available under a Creative Commons Attribution Non Commercial Share Alike Licence (CC BY-NC-SA). Details of this licence are available here |
Repository Staff Only (login required)
Downloads
Downloads per month over past year