Elementary Term Variation Operation
An Elementary Term Variation Operation is a term variation operation that corresponds to the basic syntactic term variations: coordination, elision, substitution, permutation.
- See: Term Variation Operation.
References
2001
- (Jacquemin, 2001) ⇒ Christian Jacquemin. (2001). “Spotting and Discovering Terms Through Natural Language Processing." MIT Press. ISBN:0262100851
- 'Elementary variations: The four basic types of syntactic variations (coordination, elision, modification/substitution and permutation) are called elementary variations. They can be compose into complex variations (see composition of variations). (...)
5.1 Elementary Variations of Binary Terms
The approach to term spotting that is chosen in FASTR, is a generative approach. All the possible variants that are likely to be encountered within corpora are generated through compositions of elementary variations on core word dependencies given by controlled terms. This section presents the four main families of elementary term variations in English: permutations, modifications/substitutions, coordinations and elisions.
Permutations and modifications were reported in Dunham (1986) and modifications/substitutions were considered in the studies on free indexing by Fagan (1987) and Metzler and Haas (1989). Some of the variations described in Dunham (1986) are not elementary and correspond to compositions of variations that will be studied in section 5.3. Other variants exemplified in Dunham(1986) are typical oral expressions (e.g..the postposition of the final adjective inflammation with mesothelial reaction, viseral). Since the discussion here focuses on the indexing of technical and scientific written documents such oral-specific variations are not presented here.
- 'Elementary variations: The four basic types of syntactic variations (coordination, elision, modification/substitution and permutation) are called elementary variations. They can be compose into complex variations (see composition of variations). (...)
2000
- (Daille et al., 2000) ⇒ Daille, B., Habert, B., Jacquemin, C., & Royauté, J. (2000). Empirical observation of term variations and principles for their description. Terminology. International Journal of Theoretical and Applied Issues in Specialized Communication, 3(2), 197-257.
- ABSTRACT: Terms are often supposed not to be prone to variation. Empirical observation of terms in various corpora (telecommunication, physics, medicine) shows, on the contrary, the quantitative and qualitative importance of term variation. We give a precise linguistic description of the rules relating to controlled terms and observed variants and of the constraints on these rules. This description leads to novel means of enriching terminologies via the generation of possible term variants or the simplification of nominal parse trees in order to discover potential variants.
1994
- (Jacquemin, 1994) ⇒ Jacquemin, C. (1994, October). FASTR: A unification-based front-end to automatic indexing. In Intelligent Multimedia Information Retrieval Systems and Management-Volume 1 (pp. 34-47). LE CENTRE DE HAUTES ETUDES INTERNATIONALES D'INFORMATIQUE DOCUMENTAIRE.
- ABSTRACT: Most natural language processing approaches to full-text information retrieval are based on indexing documents by the occurrences of controlled terms they contain. An important problem with this approach is that terms accept numerous variations, and can therefore cause many documents not to be retrieved although being relevant. For example, "myeloid leukaemia cells" and "myeloid and erythoid cell" are two occurrences of "myeloid cell" which cannot be detected without an account of local morpho-syntactic variations.
In this paper, we present a linguistic analysis of the observed variations and a three-tier constraint-based formalism for representing them. This technique has been implemented and results in FASTR, a natural language processing tool that extracts terms and their variants from full-text documents. We justify the choice of a unification-based formalism by its expressivity and by the addition of conceptual and computational devices which make the parser computationally tractable. Contrary to the generally accepted idea, high quality natural language processing through unification and industrial requirements can fit together, provided that the application is carefully designed in order to control and minimize data accesses and computation times.
The effectiveness of FASTR for extracting correct occurrences is supported by experiments on two English corpora of scientific abstracts and a list of 71,623 controlled terms. We report that an account of three kinds of variants (insertions, permutations and coordinations) increases recall by 16.7% without altering precision.
- ABSTRACT: Most natural language processing approaches to full-text information retrieval are based on indexing documents by the occurrences of controlled terms they contain. An important problem with this approach is that terms accept numerous variations, and can therefore cause many documents not to be retrieved although being relevant. For example, "myeloid leukaemia cells" and "myeloid and erythoid cell" are two occurrences of "myeloid cell" which cannot be detected without an account of local morpho-syntactic variations.