Uber: Montezuma's Revenge Solved by Go-Explore


Today we introduce Go-Explore, a new family of algorithms capable of achieving scores over 2,000,000 on Montezuma’s Revenge and scoring over 400,000 on average! Go-Explore reliably solves the entire game(...)

Big announcement from Uber. And this follow-on post written by a Google software engineer is excellent in contextualizing the results.

The controversial nature of the release is in the ability of the simulator to initialize to any desired state and begin learning from there. This allows Go-Explore to do a much better job at exploring the solution space. But is this initialization assumption realistic / useful?

Not sure how applicable this is to your day-to-day, but I found the combination of both posts very interesting.


Want to receive more content like this in your inbox?