Grammar Lexicalization
Grammar Lexicalization is the lexicalization of a Natural Language Grammar Theory so that rule or elementary object in the Grammar is associated with some terminal symbol.
- AKA: Grammatical Lexicalization.
- See: Information Extraction System, Lexicon, Word Formation, Calque, Abbreviation, Lexicalization.
References
2018
- (Wikipedia, 2018) ⇒ https://en.wikipedia.org/wiki/Lexicalization Retrieved:2018-4-29.
- Lexicalization is the process of adding words, set phrases, or word patterns to a language – that is, of adding items to a language's lexicon.
Whether or not word formation and lexicalization refer to the same process is a source of controversy within the field of linguistics. Most linguists assert that there is a distinction, but there are many ideas of what the distinction is. Lexicalization may be simple, for example borrowing a word from another language, or more involved, as in calque or loan translation, wherein a foreign phrase is translated literally, as in marché aux puces, or in English, flea market. Other mechanisms include compounding, abbreviation, and blending. Particularly interesting from the perspective of historical linguistics is the process by which ad hoc phrases become set in the language, and eventually become new words. (See lexicon for details.) Lexicalization contrasts with grammaticalization, and the relationship between the two processes is subject to some debate.
- Lexicalization is the process of adding words, set phrases, or word patterns to a language – that is, of adding items to a language's lexicon.
2016
- (Kaliszyk et al., 2016 ) ⇒ Kaliszyk, C., Urban, J., & Vyskočil, J. (2016). "Semantic Parsing of Mathematics by Context-based Learning from Aligned Corpora and Theorem Proving" (PDF). arXiv preprint arXiv:1611.09703.
- ABSTRACT: We study methods for automated parsing of informal mathematical expressions into formal ones, a main prerequisite for deep computer understanding of informal mathematical texts. We propose a context-based parsing approach that combines efficient statistical learning of deep parse trees with their semantic pruning by type checking and large-theory automated theorem proving. We show that the methods very significantly improve on previous results in parsing theorems from the Flyspeck corpus.
2008
- (Srihari et al., 2016) ⇒ Srihari, R. K., Li, W., Cornell, T., & Niu, C. (2004-2008). "Infoxtract: A customizable intermediate level information extraction engine" (PDF). Natural Language Engineering, 14(1), 33-69. doi: 10.1017/S1351324906004116
- ABSTRACT: Information Extraction (IE) systems assist analysts to assimilate information from electronic documents. This paper focuses on IE tasks designed to support information discovery applications. Since information discovery implies examining large volumes of heterogeneous documents for situations that cannot be anticipated a priori, they require IE systems to have breadth as well as depth. This implies the need for a domain-independent IE system that can easily be customized for specific domains: end users must be given tools to customize the system on their own. It also implies the need for defining new intermediate level IE tasks that are richer than the subject-verb-object (SVO) triples produced by shallow systems, yet not as complex as the domain-specific scenarios defined by the Message Understanding Conference (MUC). This paper describes InfoXtract, a robust, scalable, intermediate-level IE engine that can be ported to various domains. It describes new IE tasks such as synthesis of entity profiles, and extraction of concept-based general events which represent realistic near-term goals focused on deriving useful, actionable information. Entity profiles consolidate information about a person/organization/location etc. within a document and across documents into a single template; this takes into account aliases and anaphoric references as well as key relationships and events pertaining to that entity. Concept-based events attempt to normalize information such as time expressions (e.g., yesterday) as well as ambiguous location references (e.g., Buffalo). These new tasks facilitate the correlation of output from an IE engine with structured data to enable text mining. InfoXtract's hybrid architecture comprised of grammatical processing and machine learning is described in detail. Benchmarking results for the core engine and applications utilizing the engine are presented.
2000
- (Riezler et al.,2000) ⇒ Riezler, S., Kuhn, J., Prescher, D., & Johnson, M. (2000, October). "Lexicalized stochastic modeling of constraint-based grammars using log-linear measures and EM training" (PDF). In: Proceedings of the 38th Annual Meeting on Association for Computational Linguistics (pp. 480-487). Association for Computational Linguistics.[1]
- ABSTRACT: We present a new approach to stochastic modeling of constraint-based grammars that is based on loglinear models and uses EM for estimation from unannotated data. The techniques are applied to an LFG grammar for German. Evaluation on an exact match task yields 86% precision for an ambiguity rate of 5.4, and 90% precision on a subcat frame match for an ambiguity rate of 25. Experimental comparison to training from a parsebank shows a 10% gain from EM training. Also, a new class-based grammar lexicalization is presented, showing a 10% gain over unlexicalized models.