Why Momentum Really Works


Momentum proposes the following tweak to gradient descent. We give gradient descent a short-term memory.

This is interesting, but it probably won't change your world tomorrow. What is so novel about this paper is its format. This is one of the earliest pieces of research published on Distill, which I covered a couple of weeks ago, and its format, content, and tone are all quite novel within academic research. The writing is accessible, there are impressive interactive elements, and the design is attractive. Contrast that to the traditional academic PDF.

Data science is one of those rare fields where commercial practitioners are operating just on the edge of what has been proven in the lab. The ability to more effectively osmose new knowledge through this membrane will have very positive implications for the future.


Want to receive more content like this in your inbox?