Semi-supervised sequence tagging with bidirectional language models (ACL 2017)

Transferring knowledge and applying it to new domains with limited amounts of data is an active research area within NLP. Peters et al. show that not only can we use embeddings pre-trained on a large unlabelled corpus using methods like word2vec, but we can in addition use the embeddings obtained from pre-training a language model on a large corpus. It is interesting to see that both embeddings contain complimentary information, even though word2vec approximates a LM.


Want to receive more content like this in your inbox?