Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm

Distant supervision using tweets containing positive and negative emoticons has been a common method to achieve state-of-the-art performance on sentiment analysis. The authors of this EMNLP 2017 paper take this one step further and show that a model pre-trained on 1.2B tweets containing 64 different emojis obtains state-of-the-art performance on 8 sentiment, emotion and sarcasm detection datasets. The paper can be found here.


Want to receive more content like this in your inbox?