PubMed 200k RCT: a Dataset for Sequential Sentence Classification in Medical Abstracts (arXiv)

Sentence classification is useful for extracting information in many domains, but labeled data is often not available. PubMed 200k RCT is a new dataset based on PubMed consisting of 200,000 abstracts of randomized controlled trials, totaling 2.3 million sentences. Each sentence of each abstract is labeled with their role in the abstract, e.g. background, objective, method, etc.


Want to receive more content like this in your inbox?