How Transformers Work

Transformers are a type of neural network architecture that have been gaining popularity. Transformers were recently used by OpenAI in their language models, and also used recently by DeepMind for AlphaStar — their program to defeat a top professional Starcraft player.
Transformers were developed to solve the problem of sequence transduction, or neural machine translation. That means any task that transforms an input sequence to an output sequence. This includes speech recognition, text-to-speech transformation, etc.

This article is impressive—having never dug in on attention models or transformers before, I learned a lot. It's long, though, so maybe save it to digest when you have some time.


Want to receive more content like this in your inbox?