A Repository of Conversational Datasets

github.com

This repository provides tools to create reproducible datasets for training and evaluating models of conversational response. This includes: Reddit - 3.7 billion comments structured in threaded conversations; OpenSubtitles - over 400 million lines from the movie and television subtitles (available in English and other languages); Amazon QA - over 3.6 million question-response pairs in the context of Amazon products.

Read more...
Linkedin

Want to receive more content like this in your inbox?