Penn Treebank Project

From GM-RKB

(Redirected from Penn Treebank Corpus)

Jump to navigation Jump to search

The Penn Treebank Project is a Research Project to Annotate a large corpus with Syntactic Relations.

Context:
- It produced the Penn Treebank Corpus.
See: Annotation Task, Part-of-Speech Annotation Task, Natural Language Parsing.

References

2017

https://spacy.io/usage/facts-figures#section-benchmarks
- QUOTE: Parse accuracy (Penn Treebank / Wall Street Journal)
  This is the "classic" evaluation, so it's the number parsing researchers are most easily able to put in context. However, it's quite far removed from actual usage: it uses sentences with gold-standard segmentation and tokenization, from a pretty specific type of text (articles from a single newspaper, 1984-1989).

2009

http://www.cis.upenn.edu/~treebank/
- “The Penn Treebank Project annotates naturally-occuring text for linguistic structure. Most notably, we produce skeletal parses showing rough syntactic and semantic information -- a bank of linguistic trees. We also annotate text with [tags], and for the Switchboard corpus of telephone conversations,

2009

- http://www.cis.upenn.edu/~treebank/tokenization.html
- Our tokenization is fairly simple:
  - most punctuation is split from adjoining words
  - double quotes (") are changed to doubled single forward- and backward- quotes (`` and )
  - verb contractions and the Anglo-Saxon genitive of nouns are split into their component morphemes, and each morpheme is tagged separately.
  - …

2009

1994

(Marcus et al., 1994) ⇒ Mitchell P. Marcus, Grace Kim, Mary Ann Marcinkiewicz, Robert MacIntyre, Ann Bies, Mark Ferguson, Karen Katz, and Britta Schasberger. (1994). “The Penn Treebank: A revised corpus design for extracting predicate argument structure.” In: Human Language Technology, ARPA March 1994 Workshop.

1993

(Marcus et al., 1993) ⇒ Mitchell Marcus, Beatrice Santorini, and Mary Ann Marcinkiewicz. (1993). “Building a large annotated corpus of English: The Penn Treebank.” In: Computational Linguistics, 19(2).

Retrieved from "http://www.gabormelli.com/RKB/index.php?title=Penn_Treebank_Project&oldid=878235"

Concept