Lexical Item
A lexical item is an language terminal within a natural language syntax.
- AKA: Lexical Unit, Lexicalized Stem, Lexical Entry, Abstracted Surface Form, Lexicon Word, NL Terminal Symbol.
- Context:
- It is the smallest unit that a Linguistic Agent can use to compose a Linguistic Expression.
- It must be associated to a Part-of-Speech Role.
- It can range from being a Lexemes (a word sense with word paradigms) to being an Interjection Word, such as "Ummmm" and "You know".
- It can range from being a Content Word (a meaning carrier) to being a Function Word (performing a grammatical function).
- It can range from being a Derived Word, to being an Inflected Word, to being a Contracted Word.
- It can range from being a Simple Word to a (Morphologically) Complex Word.
- It can range from being an Unsegmented Terminal Word, e.g. Lebensversicherungsgesellschaftsangestellter to being a Segmented Terminal Word, e.g. “[Lebensversicherungs] [gesellschafts] [angestellter] (~ life-insurance company employee).
- It can be referenced by a Lexical Item Referencer, such as a Word Mention (within a linguistic expression), or a lexical item record (e.g. in a lexical database).
- It can have:
- one or more Word Spellings.
- one or more Word Pronunciations.
- It can be movable within a Linguistic Expression, e.g. the mice-infested house is for sale <=> the house is for sale and infested with mice.
- It can be stressed and have only one primary Phonetic Stress.
- It can be inserted between two other Words (but not within another Word ?? German Compound Word ??).
- It can be a member of a Lexical Item Cluster (such as a lexicon).
- It can be represented by a Lexical Definition Item.
- It can be an input to a Lexical Mapping Model.
- Example(s):
- “the” (a Determiner and the most frequently used lexical word in the English language).
- “man” (a Common Noun and the most frequently used common noun in the English language).
- “in” (a Preposition).
- “Kanada”/Proper Noun ⇒ "Ich lebe in [Kanada]." (which has a different Spelling in English).
- “blackbirds” ⇒ "[Blackbirds] can have red spotted wings.” (as opposed to "black bird”).
- “bank”/Noun. (which is a Homonym because it has more than one meaning).
- ⇒ “The [bank] overflowed during the flood.”
- ⇒ “The [bank] overflowed with customers.”
- “runs”/Verb ⇒ “Michael runs regularly”, (which is composed of the Unbound Morpheme run and the Bound Morpheme -s).
- “running” and “runs” are two different Words that are Members of the RUN Lexeme.
- “life insurance, a Compound Noun.
- “real time, a Compound Adjective.
- a Loan Word, such as: “Schadenfreude”.
- a Terminological Unit, such as:
- “Canadian Swallowtail Tiger Butterfly” and “Papilio glaucas” are two lexical items that refer to the same concept.
- “antidisestablishmentarianism”.
- “semi-supervised classification algorithm”
- a German Word, such as: “Lebensversicherungsgesellschaftsangestellter”, (~life insurance company employee).
- a Chinese Word, such as: 金 (for ~golden), and 英语 (for ~english language).
- …
- Counter-Example(s):
- a Phrase, such as:
- “the black bird” which is composed of three lexical items.
- “Peter's” is composed of (can be tokenized into) two Words “Peter” and “s".
- a Morpheme, such as: “un-”, “-ing”
- a Letter Character, such as “q” (from a natural language alphabet).
- a Punctuation Mark, such as: ".", "!", and "?".
- a Phoneme.
- an Utterance.
- an Encyclopedia Item.
- a Phrase, such as:
- See: Natural Language Word, Text Token, Lexicon, Lexeme, Seme (Semantics), Lexis (Linguistics).
References
2015
- (Wikipedia, 2015) ⇒ http://en.wikipedia.org/wiki/lexical_item Retrieved:2015-1-30.
- A lexical item (or lexical unit, lexical entry) is a single word, a part of a word, or a chain of words (=catena) that forms the basic elements of a language's lexicon (≈vocabulary). Examples are cat, traffic light, take care of, by the way, and it's raining cats and dogs. Lexical items can be generally understood to convey a single meaning, much as a lexeme, but are not limited to single words. Lexical items are like semes in that they are "natural units" translating between languages, or in learning a new language. In this last sense, it is sometimes said that language consists of grammaticalized lexis, and not lexicalized grammar. The entire store of lexical items in a language is called its lexis.
Lexical items composed of more than one word are also sometimes called lexical chunks, gambits, lexical phrases, lexical units, lexicalized stems, or speech formulae. The term polyword listemes is also sometimes used.
- A lexical item (or lexical unit, lexical entry) is a single word, a part of a word, or a chain of words (=catena) that forms the basic elements of a language's lexicon (≈vocabulary). Examples are cat, traffic light, take care of, by the way, and it's raining cats and dogs. Lexical items can be generally understood to convey a single meaning, much as a lexeme, but are not limited to single words. Lexical items are like semes in that they are "natural units" translating between languages, or in learning a new language. In this last sense, it is sometimes said that language consists of grammaticalized lexis, and not lexicalized grammar. The entire store of lexical items in a language is called its lexis.
2013
- http://en.wikipedia.org/wiki/Word_form#Lexemes_and_word_forms
- The distinction between these two senses of "word" is arguably the most important one in morphology. The first sense of "word", the one in which dog and dogs are "the same word", is called a lexeme. The second sense is called word form. We thus say that dog and dogs are different forms of the same lexeme. Dog and dog catcher, on the other hand, are different lexemes, as they refer to two different kinds of entities. The form of a word that is chosen conventionally to represent the canonical form of a word is called a lemma, or citation form.
2010
- http://en.wiktionary.org/wiki/Word
- A distinct unit of language (sounds in speech or written letters) with a particular meaning, composed of one or more morphemes, and also of one …
2009a
- (Wikipedia, 2009) ⇒ http://en.wikipedia.org/wiki/Word
- A word is a unit of language that carries meaning and consists of one or more morphemes which are linked more or less tightly together, and has a phonetical value. Typically a word will consist of a root or stem and zero or more affixes. …
2009b
- (WordNet, 2009) ⇒ http://wordnetweb.princeton.edu/perl/webwn?s=word%20form
- S: (n) form, word form, signifier, descriptor (the phonological or orthographic sound or appearance of a word that can be used to describe or identify something) "the inflected forms of a word can be represented by a stem and a list of inflections to be attached"
2009c
- http://www.phon.ucl.ac.uk/home/dick/enc/morphology.htm#word-form
- As you would expect from its name, a word-form is a form that corresponds to an entire word. It may be important to generalise about all such forms - for example, word-forms in English may end in up to four consonants (e.g. sixths, twelfths).
2009d
- http://en.wiktionary.org/wiki/lexical_item
- (semantics) A term — word or a sequence of words — that acts as a unit of meaning, including words, phrases, phrasal verbs and proverbs, exemplified by "cat", "traffic light", "take care of", "by-the-way", and "don't count your chickens before they hatch".
2009e
- (Wikipedia, 2009) ⇒ http://en.wikipedia.org/wiki/Inflection
- Overt inflection typically distinguishes lexical items (such as lexemes) from functional ones (such as affixes, clitics, particles and morphemes in general) and has functional items acting as markers on lexical ones.
2009f
- (WordNet, 2009) ⇒ http://wordnetweb.princeton.edu/perl/webwn?s=word
- a unit of language that native speakers can identify; "words are the blocks from which sentences are made"; "he hardly said ten words all morning"
- a brief statement; "he didn't say a word about it"
- news: information about recent and important events; "they awaited news of the outcome"
- a verbal command for action; "when I give the word, charge!"
2009g
- http://folk.uio.no/hhasselg/terms.html
- word (ord): the smallest linguistic unit that can have a syntactic function. A word has an expression side (combination of sounds, or of letters) and a content side (an independent meaning).
2009h
- http://www.cse.unsw.edu.au/~billw/nlpdict.html#word
- Words are units of language. They are built of morphemes and are used to build phrases (which are in turn used to build sentences.
- See also lexeme
- See also terminal symbol
2009i
- (Jurafsky & Martin, 2009) ⇒ Daniel Jurafsky, and James H. Martin. (2000). “Speech and Language Processing, 2nd edition." Pearson Education.
- QUOTE: For the purposes of lexical semantics, particularly for dictionaries and thesauruses, we represent a lexeme by a lemma. A lemma or citation form is the grammatical form that is used to represent a lexeme; thus, carpet is the lemma for carpets. The lemma or citation form for sing, sang, sung is sing. In many language the infinitive form is used as the lemma for the verb; thus in Spansih dormir "to sleep" is the lemma for the verb duermes "you sleep". The specific forms sung or carpets or sign or duermes are called 'wordforms.
The process of mapping from a wordform to a lemma is called lemmatization. Lemmatization is not always deterministic, since it may depend on the context. For example, the wordform found can map to the lemma find (meaning 'to locate' or the lemma found ('to create an institution').
- QUOTE: For the purposes of lexical semantics, particularly for dictionaries and thesauruses, we represent a lexeme by a lemma. A lemma or citation form is the grammatical form that is used to represent a lexeme; thus, carpet is the lemma for carpets. The lemma or citation form for sing, sang, sung is sing. In many language the infinitive form is used as the lemma for the verb; thus in Spansih dormir "to sleep" is the lemma for the verb duermes "you sleep". The specific forms sung or carpets or sign or duermes are called 'wordforms.
2008a
- (Masse et al., 2008) ⇒ Blondin Masse, A, G. Chicoisne, Y. Gargouri, Stevan Harnad, O. Picard, and O. Marcotte. (2008). “How Is Meaning Grounded in Dictionary Definitions?.” In: TextGraphs-3 Workshop, 22nd International Conference on Computational Linguistics (Coling 2008).
- QUOTE: We know from the 19th century philosopher-mathematician Frege that the referent and the meaning (or “sense”) of a word (or phrase) are not the same thing: two different words or phrases can refer to the very same object without having the same meaning (Frege, 1948): “George W. Bush” and “the current president of the United States of American have the same referent but a different meaning. So do “human females” and “daughters”. And “things that are bigger than a breadbox” and “things that are not the size of a breadbox or smaller”.
A word’s “extension” is the set of things to which it refers, and its “intension” is the rule for defining what things fall within its extension. A word’s meaning is hence something closer to a rule for picking out its referent. Is the dictionary definition of a word, then, its meaning?
Clearly, if we do not know the meaning of a word, we look up its definition in a dictionary. But what if we do not know the meaning of any of the words in its dictionary definition? And what if we don’t know the meanings of the words in the definitions of the words defining those words, and so on? This is a problem of infinite regress, called the “symbol grounding problem” (Harnad, 1990; Harnad, 2003).
- QUOTE: We know from the 19th century philosopher-mathematician Frege that the referent and the meaning (or “sense”) of a word (or phrase) are not the same thing: two different words or phrases can refer to the very same object without having the same meaning (Frege, 1948): “George W. Bush” and “the current president of the United States of American have the same referent but a different meaning. So do “human females” and “daughters”. And “things that are bigger than a breadbox” and “things that are not the size of a breadbox or smaller”.
2008b
- (Manning et al., 2008) ⇒ Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. (2008). “Introduction to Information Retrieval." Cambridge University Press. ISBN:0521865719.
- QUOTE: A token is an instance of a sequence of characters in some particular document that are grouped together as a useful semantic unit for processing. A type is the class of all Text Token|token]]s containing the same character sequence. A term is a (perhaps normalized) type that is included in the IR system's dictionary.
2007
- (Kakkonen, 2007) ⇒ Tuomo Kakkonen. (2007). “Framework and Resources for Natural Language Evaluation." Academic Dissertation. University of Joensuu.
- Definition 3-1. Symbol, terminal and alphabet.
- A symbol is a distinguishable character, such as “a”, “b” or “c”.
- Any permissible sequence of symbols is called a terminal (also referred to as a word).
- A finite, nonempty set ∑ of terminals is called an alphabet.
- A lexicon is a structure that defines the 'terminals in a language.
- A grammar [math]\displaystyle{ G }[/math] consists of a lexicon and rules.
- Definition 3-1. Symbol, terminal and alphabet.
2004
- (Diab et al., 2004) ⇒ Mona Diab, Kadri Hacioglu, and Daniel Jurafsky. (2004). “Automatic Tagging of Arabic Text: From Raw Text to Base Phrase Chunks.” In: Proceedings of NAACL-HLT 2004.
- “Morphological analysis may be characterized as the process of segmenting a surface word form into its component derivational and inflectional morphemes."
2003a
- (Mikheev, 2003) ⇒ Andrei Mikheev. (2003). “Text Segmentation.” In: (Mitkov, 2003).
- The first step in the majority of text processing applications is to segment text into words. The term 'word', however, is ambiguous: a word from a language's vocabulary can occur many times in the text but it is still a single individual word of the language. So there is a distinction between words of vocabulary or word types and multiple occurrences of these words in the text which are called word tokens. This is why the process of segmenting words tokens in text is called tokenization. Although the distinction between word types and word tokens is important it is usual to refer to the both as 'words' whenever the context unambiguously implies the interpretation.
2003b
- (Mitkov, 2003) ⇒ Ruslan Mitkov, editor. (2003). “The Oxford Handbook of Computational Linguistics." Oxford University Press. ISBN:019927634X
- word-type: A word in a language vocabulary, as opposed to its specific occurrence in text. Compare word-token.
2003c
- (Mitkov, 2003) ⇒ Ruslan Mitkov, editor. (2003). “The Oxford Handbook of Computational Linguistics." Oxford University Press. ISBN:019927634X
- lexical entry: A word or phrase in a used as a peg on which to hang information about part of speech, subcategorization, meaning, pronunciation, links to related terms, and/or any of various other kinds of information.
2000
- (Bauer, 2000) ⇒ Laurie Bauer. (2000). “Word.” In: "Morphology.", edited by Geert Booij, Christian Lehmann, and Joachim Mugdan. ISBN:9783110111286
- A word-form, on the other hand, is the orthographic or phonological form which represents a lexeme. The terms appear to have used first by Matthews (1972: 41), although the notion was current much earlier. The usual notation is to mark word-forms in italics, and this will be followed here. Although Lyons himself uses at least three different notations for lexemes, that most frequently adopted in other works is the notation introduced in 1968, by which lexemes are indicated by the use of small capitals …
1998
- (Carter, 1998) ⇒ Ronald Carter. (1998). “Vocabulary: Applied Linguistic Perspectives; 2nd edition." Routledge.
- QUOTE: One theoretical notion which may help us to resolve some of the above problems is that of the lexeme. A lexeme is the abstract unit which underlies some of the variants we have observed in connection with 'words'. Thus BRING is the lexeme which underlies different grammatical variants: 'bring', 'brought', 'brings', 'bringing' which we can refer to as word-forms (note a lexeme is conventionally represented by upper-case letters and that quotation marks are used for its word-forms). Lexemes are the basic, contrasting units of vocabulary in a language. When we look up words in a dictionary we are looking up lexemes rather than words. That is, 'brought' and 'bringing' will be found under and entry for BRING. The lexeme BRING is an abstraction. It does not actually occur itself in texts. Instead, it realizes different word-forms. Thus, the word-form 'bring' is realized by the lexeme BRING; the lexeme GO realizes the word-form 'went'. In a diction each lexeme merits a separate entry or sub-entry.
The term lexeme also embraces items which consist of more than one word-form. Into the category come lexical items such as multi-word verbs (to catch up on), phrasal verbs (to drop in) and idioms (kick the bucket). Here, KICK THE BUCKETis a lexeme and would appear a such in a single dictionary entry even though it is a three-word form. …
An important question which also arises her concerns our own metalanguage in this book. Should we talk of words or word-forms or lexemes or lexical items? It is clear that the uses of these words word or vocabulary have a general common-sense validity and are serviceable when there is no real need to be precise. They will continue to be used for general reference. The terms lexeme and the word-forms of a lexeme are valuable theoretical concepts and will be used when theoretical distinctions are necessary. Lexical item(s) (or sometimes vocabulary items or simply items) is a useful and fairly neutral hold-all term which captures and, to some extend, helps to overcome instability in the term word, especially when it become limited by orthography.
In this chapter there is a distinct shift from examining lexical items at the level of the orthographic ‘word’ or in the patterns which occur in fixed expressions towards a consideration of lexis in larger units of language organization.
- QUOTE: One theoretical notion which may help us to resolve some of the above problems is that of the lexeme. A lexeme is the abstract unit which underlies some of the variants we have observed in connection with 'words'. Thus BRING is the lexeme which underlies different grammatical variants: 'bring', 'brought', 'brings', 'bringing' which we can refer to as word-forms (note a lexeme is conventionally represented by upper-case letters and that quotation marks are used for its word-forms). Lexemes are the basic, contrasting units of vocabulary in a language. When we look up words in a dictionary we are looking up lexemes rather than words. That is, 'brought' and 'bringing' will be found under and entry for BRING. The lexeme BRING is an abstraction. It does not actually occur itself in texts. Instead, it realizes different word-forms. Thus, the word-form 'bring' is realized by the lexeme BRING; the lexeme GO realizes the word-form 'went'. In a diction each lexeme merits a separate entry or sub-entry.
1994
- (Connolly & Phillips, 1994) ⇒ John F. Connolly, and Natalie A. Phillips. (1994). “Event-Related Potential Components Feflect Phonological and Semantic Processing of the Terminal Word of Spoken Sentences.” In: Journal of Cognitive Neuroscience, 6(3).
- QUOTE: An event-related brain potential (ERP) reflecting the acoustic-phonetic process in the phonological stage of word processing was recorded to the terminal words of spoken sentences. The peak latency of this negative-going response occurred between 270 and 300 msec after the onset of the terminal word.
1992
- (Seneff, 1992) ⇒ Stephanie Seneff. (1992). “TINA: a natural language system for spoken language applications.” In: Computational Linguistics, 18(1).
- QUOTE: due to the uncertainty as to the identity of the terminal word strings inherent in spoken input.