Linguistic Atomic Unit
Jump to navigation
Jump to search
A Linguistic Atomic Unit is a Linguistic Item that is a basic unit in a Language Model.
- AKA: Language Model Atomic Unit.
- Example(s):
- Words are atomic units in a Word-level Language Model.
- Subwords are atomic units in a Subword-level Language Model.
- Characters are atomic units in a Character-level Language Model.
- Phrases are atomic units in a Phrase-level Language Model.
- Sentences are atomic units in a Sentence-level Language Model.
- Syllables are atomic units in Natural Language.
- a Token.
- …
- Counter-Example(s):
- See: String, Text Item, Character, Token, Phrase, Sentence, Natural Language, Symbol, Unicode.
References
2015
- (Lopes et al., 2015) ⇒ Antonio Luis Vilarinho dos Santos Lopes, David Martins de Matos, Vera Cabarrao, Ricardo Ribeiro, Helena Moniz, Isabel Trancoso, and Ana Isabel Mata (2015). "Towards Using Machine Translation Techniques to Induce Multilingual Lexica of Discourse Markers". In: Pre-Print: arXiv::1503.09144.
- QUOTE: Machine translation systems can be classified according to the atomic units to be translated: for example, while for word-based methods the atomic unit is the word, for phrase-based methods the atomic unit is the phrase. Thus, the most important knowledge sources of phrase-based methods are tables of possible phrase translations between language pairs. Phrase-based methods, due to their nature, have, at least, one interesting advantage for this specific work: they can handle non-compositional phrases.
2005
- (Marantz, 2005) ⇒ Alec Marantz (2005). “Generative Linguistics Within The Cognitive Neuroscience Of Language". In: The Linguistic Review 22, 429–445.
- QUOTE: The well-formedness of a linguistic structure is understood to be recursively defined. That is, one asks about a structure $C$ whether it contains pieces or is an atomic unit. If it is an atomic unit, one searches one's list of atomic units, and if $C$ occurs on this list, the structure is well-formed. If $C$ contains pieces, it is well-formed if each of the pieces is well-formed and the method of composing the pieces into $C$ is licensed/well-formed. Each of the pieces constituting $C$ might itself be atomic or consist of other pieces. This recursive definition of well-formedness assumes a bedrock of listed atoms for composition. It also implies a hierarchical constituent structure, with levels of embedding of complex (non-atomic) constituents.