PubMed 200k RCT: a Dataset for Sequential Sentence Classification in Medical Abstracts (arXiv)

arxiv.org

Sentence classification is useful for extracting information in many domains, but labeled data is often not available. PubMed 200k RCT is a new dataset based on PubMed consisting of 200,000 abstracts of randomized controlled trials, totaling 2.3 million sentences. Each sentence of each abstract is labeled with their role in the abstract, e.g. background, objective, method, etc.

Read more...
Linkedin

Want to receive more content like this in your inbox?