Word n-Gram
(Redirected from Token N-gram)
Jump to navigation
Jump to search
A Word n-Gram is an text-item n-gram composed of word forms.
- AKA: Token n-Gram.
- Context:
- It can range from being a 1-Word n-Gram to being a 2-Word n-Gram to being a 3-Word n-Gram to being ...
- It can range from being a Text Window-based Word n-Gram, to being a Sentence Window-based Word n-Gram, to being a Text Window-based Word n-Gram, ...
- It can represent Adjacent Words in a Text.
- It can be a member of a Word n-Gram Set (see Word N-gram Model).
- Example(s):
- "ceramics collected by" ⇒ (52) a 3-gram Word N-gram from the Google N-gram Dataset.
- "serve as the independent" ⇒ (794) a 4-gram Word N-gram from the Google N-gram Dataset.
- Token Shingle.
- …
- Counter-Example(s):
- a Character n-Gram, such as "TEX" a 3-gram Character N-gram from (Cavnar & Trenkle, 1994).
- a Nucleotide n-Gram.
- See: Word Skip-Gram.