GitHub - zphang/bert_on_stilts: BERT on STILTS

Want to beat Google’s state of the art performance using their own BERT model? Just train their model twice. Researchers at NYU have released an augmentation technique to BERT that runs two full training passes, with the first devoted to single-task pre-training instead of Google’s original multi-task training. It turns out that allowing the model to focus on one task at a time in pre-training boosts results on multiple tasks later on.


Want to receive more content like this in your inbox?