Learned in Translation: Contextualized Word Vectors


One of the recent game changers in computer vision has been the combination of ImageNet + a large pre-trained CNN. In NLP, we have so far not found a model that is as useful for transfer learning (see also my post here). The closest we have in terms of data size and model capacity is Machine Translation. Researchers from Salesforce now show that we can pre-train not only the word vectors but the entire LSTM embeddings using MT, which can then be transferred successfully to a wide range of tasks. The paper and an MIT Tech Review post can be found here and here.


Want to receive more content like this in your inbox?