Part-of-Speech (POS) Prediction Task

Context:
- Input:
  - a Text Token String (of a Linguistic Expression).
  - a POS Tag Set.
  - optional: a POS Tagger.
- output: a POS Tag Sequence.
  - optional: a POS Tagging Model.
- It is a text token classification task.
- It can range from being a Heuristic POS Tagging Task to being a Data-Driven POS Tagging Task (such as an unsupervised POS-tagging, or supervised POS-tagging).
- It can be solved by a Part-of-Speech Tagging System (that implements a part-of-speech tagging algorithm).
- It can proceed after a Word Mention Segmentation Task.
- It can indicate ambiguity in the expression. E.g. entertaining/JJ|VBN.
Example(s):
- PoSTT("Colorless green ideas sleep furiously.”) ⇒ Colorless/JJ green/JJ ideas/NNS sleep/VBP furiously/RB ./. .
- PoSTT("This automatic teller machine is a work in progress.”) ⇒ This/DT automatic/JJ teller/NN machine/NN is/AUX a/DT work/NN in/IN progress/NN ./. .
- PoSTT("This automatic_teller_machine is a work_in_progress.”) ⇒ This/DT automatic_teller_machine/NN is/AUX a/DT work_in_progress/NN ./. .
- PoSTT("The duchess was entertaining last night.”) ⇒ The/DT duchess/NN was/AUX entertaining/JJ|VBN last/JJ night/NN. .
- …
Counter-Example(s):
- a Text Segmentation Task (such as word mention segmentation).
- a Semantic Role Labeling Task.
- a Next Word Prediction Task, or Next Letter Classification Task.
- a Text Item Categorization Task.
See: Penn Treebank Project; Text Item Shallow Parsing.

References

(Wikipedia, 2015) ⇒ http://en.wikipedia.org/wiki/Part-of-speech_tagging Retrieved:2015-4-11.
- In corpus linguistics, part-of-speech tagging (POS tagging or POST), also called grammatical tagging or word-category disambiguation, is the process of marking up a word in a text (corpus) as corresponding to a particular part of speech, based on both its definition, as well as its context—i.e. relationship with adjacent and related words in a phrase, sentence, or paragraph.
  A simplified form of this is commonly taught to school-age children, in the identification of words as nouns, verbs, adjectives, adverbs, etc.
  Once performed by hand, POS tagging is now done in the context of computational linguistics, using algorithms which associate discrete terms, as well as hidden parts of speech, in accordance with a set of descriptive tags. POS-tagging algorithms fall into two distinctive groups: rule-based and stochastic. E. Brill's tagger, one of the first and most widely used English POS-taggers, employs rule-based algorithms.

(Daelemans, 2011) ⇒ Walter Daelemans. (2011). “POS Tagging.” In: (Sammut & Webb, 2011) p.776

(Elkan, 2008) ⇒ Charles Elkan. (2008). “Log-linear models and conditional random fields." Notes for a tutorial at CIKM-2008 (CIKM 2008).
- QUOTE: POS tagging is an example of what is called a structured prediction task. The goal is to predict a complex label (a sequence of POS tags) for a complex input (an entire sentence). The word “structured” refers to the fact that labels have internal structure, in this case being sequences

(Sproat et al, 1996) ⇒ Richard Sproat, William A. Gale, Chilin Shih, and Nancy Chang. (1996). “A Stochastic Finite-state Word-Segmentation Algorithm for Chinese.” In: Computational Linguistics, 22(3).
- QUOTE: Given that part-of-speech labels are properties of words rather than morphemes, it follows that one cannot do part-of-speech assignment without having access to word-boundary information.