NLP's ImageNet moment has arrived

This is absolutely the must-read post of the week. The intro:

The long reign of word vectors as NLP’s core representation technique has seen an exciting new line of challengers emerge: ELMoULMFiT, and the OpenAI transformer. These works made headlines by demonstrating that pretrained language models can be used to achieve state-of-the-art results on a wide range of NLP tasks. Such methods herald a watershed moment: they may have the same wide-ranging impact on NLP as pretrained ImageNet models had on computer vision.

I don't follow NLP super-closely, but apparently these breakthrough results have been piling up over the course of 2018. I also hadn't been deeply familiar with just how influential ImageNet was:

Transfer learning via pretraining on ImageNet is in fact so effective in computer vision that not using it is now considered foolhardy.

If this transition is real, it's significant: advances of this import come along rarely. From the conclusion:

In light of the impressive empirical results of ELMo, ULMFiT, and OpenAI it only seems to be a question of time until pretrained word embeddings will be dethroned and replaced by pretrained language models in the toolbox of every NLP practitioner.


