Noising and Denoising Natural Language: Diverse Backtranslation for Grammar Correction (NAACL-HLT 2018)

cs.stanford.edu

The authors synthesize parallel data for grammatical error correction by noising a clean monolingual corpus. They use a seq2seq model to translate clean examples to noisy ones and propose additional beam search noising procedures to introduce more diversity. Starting with models trained on roughly 1.3M sentences, they nearly match performance of training with 3M sentences

Read more...
Linkedin

Want to receive more content like this in your inbox?