2009 TheARTCorpus
- (Liakata & Soldatova, 2009) ⇒ Maria Liakata, Larisa N. Soldatova. (2009). “The ART Corpus.” Technical report, Aberystwyth University.
Subject Headings: ART Corpus.
Notes
Cited By
Quotes
Abstract
The ART corpus consist of 225 papers manually annotated the CISP labels (i.e. “Goal", "Method", "Result"). The ART Corpus is >1 million words, 35,040 sentences. These papers cover topics in physical chemistry and biochemistry and were provided by the Royal Society of Chemistry (RSC) Publishing. The Corpus was developed primarily to add value to scientific papers, through semantic markup that would make it easier for natural language processing and semantic web applications to automatically extract information pertaining to core scientific concepts. The ART corpus can also be used as a training set for machine learning algorithms, in order to automate the annotation of papers with CISP meta-data. The corpus is available as a collection of 225 .xml files, where each file corresponds to a separate paper whose sentences have been annotated individually with core scientific concepts.
,
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
2009 TheARTCorpus | Maria Liakata Larisa Soldatova | The ART Corpus | http://cadair.aber.ac.uk/dspace/handle/2160/1979 | 2009 |