Text Token
Jump to navigation
Jump to search
A text token is a grapheme string within a text item that carries simple surface-level meaning and function.
- Context:
- It can range from being a Word Token to being a Punctuation Token.
- It can be associated with a POS Tag.
- It can be associated with a Text Token Location.
- It can range from being a Single-Token Word Mention to being a Multi-Token Word Mention.
- It can be a String Member of a Text Token String.
- It can be represented by a Text Token Predictor Feature (from a text token feature space).
- It can be identified by a Text Tokenization Task.
- Example(s):
- “
sentence
” is the 4th token on “This is a sentence.
". - …
- “
- Counter-Example(s):
- See: Contiguous Substring, Text Token Window.
References
2008
- (Manning et al., 2008) ⇒ Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. (2008). “Introduction to Information Retrieval." Cambridge University Press. ISBN:0521865719.
- QUOTE: Given a character sequence and a defined document unit, tokenization is the task of chopping it up into pieces, called tokens, perhaps at the same time throwing away certain characters, such as punctuation. ... A token is an instance of a sequence of characters in some particular document that are grouped together as a useful semantic unit for processing. A type is the class of all tokens containing the same character sequence. A term is a (perhaps normalized) type that is included in the IR system's dictionary.