Penn Treebank Tag-set
Jump to navigation
Jump to search
A Penn Treebank Tag-set is a set of part-of-speech tags used in the Penn Treebank Project's.
- AKA: Penn Treebank Format.
- Example(s):
- Counter-Example(s):
- a Reading Comprehension Dataset such as:
- a question-answer dataset such as:
- a Time Series Dataset.
- See: Tag, Part-of-Speech Tagging System, Natural Language Processing System, Training Dataset, Text Dataset.
References
2019a
- (Shanker, 2019) ⇒ Vijay K. Shanker (2019)."Penn Treebank POS Tag Set" Retrieved 2019-05-30.
- QUOTE: The Penn treebank POS tag set has 36 POS tags plus 12 others for punctuations and special symbols. These are listed below. For more details, refer to paper by Marcus, Marcinkiewicz and Santorini that appeared in Computational Linguistics, June 1993 issue 19(2), pages 313-330. (http://acl.ldc.upenn.edu/J/J93/J93- 2004.pdf )
Many examples below were taken from http://www.comp.leeds.ac.uk/amalgam/tagsets/upenn.htm
- QUOTE: The Penn treebank POS tag set has 36 POS tags plus 12 others for punctuations and special symbols. These are listed below. For more details, refer to paper by Marcus, Marcinkiewicz and Santorini that appeared in Computational Linguistics, June 1993 issue 19(2), pages 313-330. (http://acl.ldc.upenn.edu/J/J93/J93- 2004.pdf )
2019b
- (Marcus et al., 2019) ⇒ Mitchell P. Marcus, Beatrice Santorini, Mary Ann Marcinkiewicz (2019). "Building a large annotated corpus of English: the Penn Treebank" Retrieved 2019-05-30.
- QUOTE: The Penn Treebank tagset is given in Table 2. It contains 36 POS tags and 12 other tags (for punctuation and currency symbols). A detailed description of the guidelines governing the use of the tagset is available in Satorini 1990.
Table 2: The Penn Treebank POS tagset 1. CC Coordinating conjunction 25.TO to 2. CD Cardinal number 26.UH Interjection 3. DT Determiner 27.VB Verb, base form 4. EX Existential there 28.VBD Verb, past tense 5. FW Foreign word 29.VBG Verb, gerund/present participle 6. IN Preposition/subord. 30.VBN Verb, past participle 218z conjunction 7. JJ Adjective 31.VBP Verb, non-3rd ps. sing. present 8. JJR Adjective, comparative 32.VBZ Verb, 3rd ps. sing. present 9. JJS Adjective, superlative 33.WDT wh-determiner 10.LS List item marker 34.WP wh-pronoun 11.MD Modal 35.WP Possessive wh-pronoun 12.NN Noun, singular or mass 36.WRB wh-adverb 13.NNS Noun, plural 37. # Pound sign 14.NNP Proper noun, singular 38. $ Dollar sign 15.NNPS Proper noun, plural 39. . Sentence-final punctuation 16.PDT Predeterminer 40. , Comma 17.POS Possessive ending 41. : Colon, semi-colon 18.PRP Personal pronoun 42. (Left bracket character 19.PP Possessive pronoun 43. ) Right bracket character 20.RB Adverb 44. “ Straight double quote 21.RBR Adverb, comparative 45. ` Left open single quote 22.RBS Adverb, superlative 46. “ Left open double quote 23.RP Particle 47. ' Right close single quote 24.SYM Symbol 48. “ Right close double quote (mathematical or scientific) __________________ Some symbols cannot be displayed in HTML format
2003
- (Upenn, 2003) ⇒ https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html
- QUOTE: Alphabetical list of part-of-speech tags used in the Penn Treebank Project:
- NumberTagDescription
1. CC Coordinating conjunction 2. CD Cardinal number 3. DT Determiner 4. EX Existential there 5. FW Foreign word 6. IN Preposition or subordinating conjunction 7. JJ Adjective 8. JJR Adjective, comparative 9. JJS Adjective, superlative 10. LS List item marker 11. MD Modal 12. NN Noun, singular or mass 13. NNS Noun, plural 14. NNP Proper noun, singular (...) (...) (...)
1990
- (Santorini, 1990) ⇒ Beatrice Santorini. (1990). “https://rhetory.com/corpustool/PennTreebankTags.pdf Part-of-speech Tagging Guidelines for the Penn Treebank Project]." Technical report MS-CIS-90-47, Department of Computer and Information Science, University of Pennsylvania.
- QUOTE: This manual addresses the linguistic issues that arise in connection with annotating texts by part of speech ("tagging"). Section 2 is an alphabetical list of the parts of speech encoded in the annotation system of the Penn Treebank Project, along with their corresponding abbreviations ("tags") and some information concerning their definition.