We explore building generative neural network models of popular reinforcement learning environments. Our world model can be trained quickly in an unsupervised manner to learn a compressed spatial and temporal representation of the environment.
This is a full online paper, complete with lots of interactive elements that illustrate the points that it's making. I found the entire read fascinating; here's my favorite part:
...in our initial experiments, we noticed that our agent discovered an adversarial policy to move around in such a way so that the monsters in this virtual environment never shoots a single fireball. Even when there are signs of a fireball forming, the agent will move in a way to extinguish the fireballs magically as if it has superpowers in the environment.
Or, in simpler terms: the AI discovered cheat codes.Read more...