MURAL - Maynooth University Research Archive Library



    Gradient Descent: Second-Order Momentum and Saturating Error


    Pearlmutter, Barak A. (1991) Gradient Descent: Second-Order Momentum and Saturating Error. Advances in Neural Information Processing Systems. pp. 887-894. ISSN 1049-5258

    [img]
    Preview
    Download (1MB) | Preview


    Share your research

    Twitter Facebook LinkedIn GooglePlus Email more...



    Add this article to your Mendeley library


    Abstract

    Batch gradient descent, ~w(t) = -7JdE/dw(t) , conver~es to a minimum of quadratic form with a time constant no better than '4Amax/ Amin where Amin and Amax are the minimum and maximum eigenvalues of the Hessian matrix of E with respect to w. It was recently shown that adding a momentum term ~w(t) = -7JdE/dw(t) + Q'~w(t - 1) improves this to ~ VAmax/ Amin, although only in the batch case. Here we show that secondorder momentum, ~w(t) = -7JdE/dw(t) + Q'~w(t -1) + (3~w(t - 2), can lower this no further. We then regard gradient descent with momentum as a dynamic system and explore a non quadratic error surface, showing that saturation of the error accounts for a variety of effects observed in simulations and justifies some popular heuristics.

    Item Type: Article
    Keywords: Gradient Descent; Second-Order Momentum; Saturating Error;
    Academic Unit: Faculty of Science and Engineering > Computer Science
    Faculty of Science and Engineering > Research Institutes > Hamilton Institute
    Item ID: 5539
    Depositing User: Barak Pearlmutter
    Date Deposited: 04 Nov 2014 14:44
    Journal or Publication Title: Advances in Neural Information Processing Systems
    Publisher: Massachusetts Institute of Technology Press (MIT Press)
    Refereed: Yes
    URI:

    Repository Staff Only(login required)

    View Item Item control page

    Downloads

    Downloads per month over past year