Penn Discourse Treebank
Jump to navigation
Jump to search
A Penn Discourse Treebank is a Benchmark Task for Discourse-level Analysis.
- AKA: PDTB.
- See: Biomedical Discourse Relation Bank, Discourse Relation, News Article.
References
- http://www.seas.upenn.edu/~pdtb/
- ABSTRACT: The goal of the PDTB project is to develop a large scale corpus annotated with information related to discourse structure. While there are many aspects of discourse that are crucial to a complete understanding of natural language, the Penn Discourse Treebank (PDTB) focuses on encoding coherence relations associated with discourse connectives. The annotations include the argument structure of the connectives, thus exposing a clearly defined level of discourse structure which will support the extraction of a range of inferences associated with discourse connectives. Some other annotated features associated with discourse connectives and their arguments include sense distinctions for discourse connectives, and attribution-related features for both connectives and their arguments.
2007
- (Joshi, 2007) ⇒ Aravind K. Joshi. (2007). “Complexity of Dependencies in Natural Language." Presentation given at Simon Fraser University, Vancouver, Feb 22.
- Annotation text source: the Wall Street Journal. Same as the Penn Treebank. 2304 articles, ~1M words.
- PDTB first release (PDTB-1.0) appeared in March 2006.
- http://www.seas.upenn.edu/~pdtb
- PDTB final release (PDTB-2.0) is planned for April 2007.
- Collaborators: Rashmi Prasad, Alan Lee, Nikhil Dinesh, Eleni Miltsakaki, and Bonnie Webber (U. Edinburgh)