L4: Practical Loss-based Stepsize Adaptation for Deep Learning

arxiv.org

The authors propose a step size adaptation scheme for stochastic gradient descent. It operates directly with the loss function and rescales the gradient in order to make fixed predicted progress on the loss. They demonstrate its capabilities by strongly improving the performance of Adam and Momentum optimizers. The enhanced optimizers with default hyperparameters consistently outperform their constant step size counterparts, even the best ones, without a measurable increase in computational cost. 

Read more...
Linkedin

Want to receive more content like this in your inbox?