L4: Practical Loss-based Stepsize Adaptation for Deep Learning


The authors propose a step size adaptation scheme for stochastic gradient descent. It operates directly with the loss function and rescales the gradient in order to make fixed predicted progress on the loss. They demonstrate its capabilities by strongly improving the performance of Adam and Momentum optimizers. The enhanced optimizers with default hyperparameters consistently outperform their constant step size counterparts, even the best ones, without a measurable increase in computational cost. 


Want to receive more content like this in your inbox?