Part-of-Speech (POS) Prediction Task
A Part-of-Speech (POS) Prediction Task is a syntactic text token classification task whose class set if a POS tag set (that requires each text token to be mapped to a part-of-speech role).
- Context:
- Input:
- a Text Token String (of a Linguistic Expression).
- a POS Tag Set.
- optional: a POS Tagger.
- output: a POS Tag Sequence.
- It is a text token classification task.
- It can range from being a Heuristic POS Tagging Task to being a Data-Driven POS Tagging Task (such as an unsupervised POS-tagging, or supervised POS-tagging).
- It can be solved by a Part-of-Speech Tagging System (that implements a part-of-speech tagging algorithm).
- It can proceed after a Word Mention Segmentation Task.
- It can indicate ambiguity in the expression. E.g. entertaining/JJ|VBN.
- Input:
- Example(s):
PoSTT("Colorless green ideas sleep furiously.”) ⇒ Colorless/JJ green/JJ ideas/NNS sleep/VBP furiously/RB ./.
.PoSTT("This automatic teller machine is a work in progress.”) ⇒ This/DT automatic/JJ teller/NN machine/NN is/AUX a/DT work/NN in/IN progress/NN ./.
.PoSTT("This automatic_teller_machine is a work_in_progress.”) ⇒ This/DT automatic_teller_machine/NN is/AUX a/DT work_in_progress/NN ./.
.PoSTT("The duchess was entertaining last night.”) ⇒ The/DT duchess/NN was/AUX entertaining/JJ|VBN last/JJ night/NN.
.- …
- Counter-Example(s):
- See: Penn Treebank Project; Text Item Shallow Parsing.
References
2015
- (Wikipedia, 2015) ⇒ http://en.wikipedia.org/wiki/Part-of-speech_tagging Retrieved:2015-4-11.
- In corpus linguistics, part-of-speech tagging (POS tagging or POST), also called grammatical tagging or word-category disambiguation, is the process of marking up a word in a text (corpus) as corresponding to a particular part of speech, based on both its definition, as well as its context—i.e. relationship with adjacent and related words in a phrase, sentence, or paragraph.
A simplified form of this is commonly taught to school-age children, in the identification of words as nouns, verbs, adjectives, adverbs, etc.
Once performed by hand, POS tagging is now done in the context of computational linguistics, using algorithms which associate discrete terms, as well as hidden parts of speech, in accordance with a set of descriptive tags. POS-tagging algorithms fall into two distinctive groups: rule-based and stochastic. E. Brill's tagger, one of the first and most widely used English POS-taggers, employs rule-based algorithms.
- In corpus linguistics, part-of-speech tagging (POS tagging or POST), also called grammatical tagging or word-category disambiguation, is the process of marking up a word in a text (corpus) as corresponding to a particular part of speech, based on both its definition, as well as its context—i.e. relationship with adjacent and related words in a phrase, sentence, or paragraph.
2011
- (Daelemans, 2011) ⇒ Walter Daelemans. (2011). “POS Tagging.” In: (Sammut & Webb, 2011) p.776
2008
- (Elkan, 2008) ⇒ Charles Elkan. (2008). “Log-linear models and conditional random fields." Notes for a tutorial at CIKM-2008 (CIKM 2008).
- QUOTE: POS tagging is an example of what is called a structured prediction task. The goal is to predict a complex label (a sequence of POS tags) for a complex input (an entire sentence). The word “structured” refers to the fact that labels have internal structure, in this case being sequences
1996
- (Sproat et al, 1996) ⇒ Richard Sproat, William A. Gale, Chilin Shih, and Nancy Chang. (1996). “A Stochastic Finite-state Word-Segmentation Algorithm for Chinese.” In: Computational Linguistics, 22(3).
- QUOTE: Given that part-of-speech labels are properties of words rather than morphemes, it follows that one cannot do part-of-speech assignment without having access to word-boundary information.
1995
- (Brill, 1995) ⇒ Eric D. Brill. (1995). “Transformation-based Error-Driven Learning and Natural Language Processing: A Case Study in Part of Speech Tagging." Computational Linguistics 21(4).