NLP's ImageNet moment has arrived

This is absolutely the must-read post of the week. The intro:

The long reign of word vectors as NLP’s core representation technique has seen an exciting new line of challengers emerge: ELMoULMFiT, and the OpenAI transformer. These works made headlines by demonstrating that pretrained language models can be used to achieve state-of-the-art results on a wide range of NLP tasks. Such methods herald a watershed moment: they may have the same wide-ranging impact on NLP as pretrained ImageNet models had on computer vision.

I don't follow NLP super-closely, but apparently these breakthrough results have been piling up over the course of 2018. I also hadn't been deeply familiar with just how influential ImageNet was:

Transfer learning via pretraining on ImageNet is in fact so effective in computer vision that not using it is now considered foolhardy.

If this transition is real, it's significant: advances of this import come along rarely. From the conclusion:

In light of the impressive empirical results of ELMo, ULMFiT, and OpenAI it only seems to be a question of time until pretrained word embeddings will be dethroned and replaced by pretrained language models in the toolbox of every NLP practitioner.


Want to receive more content like this in your inbox?