2004 DetectingSemanticRelsBetTermsInDefs

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Terminology, Semantic Relation, Definition Mention, Lexico-Syntactic Pattern.

Notes

Quotes

Abstract

  • Terminology structuring aims to elicit semantic relations between the terms of a domain. We propose here to exploit definitions found in corpora to obtain such semantic relations. Definition typologies show that definitions can be introduced by different semantic relations, some of these relations being likely to structure terminologies. Our aim is therefore to mine "defining expressions" in domain-specific corpora, and to detect the semantic relations they involve between their main terms. We use lexico-syntactic markers and patterns to detect at the same time both a definition and its main semantic relation. 46 markers and 74 patterns have been designed and tuned on a first corpus in the field of anthropology. We report on their evaluation on a second corpus in the field of dietetics, where they obtained 4% to 36% recall and from 61 to 66% precision, and discuss the relative accuracy of different subclasses of markers for this task.

1 Introduction

  • A terminology is an artifact structuring terms according to some semantic relations. Grabar and Hamon (2004) present the different semantic relations likely to be found in terminologies. These can be divided into lexical (synonymy), vertical (hypernymy, meronymy) and transversal relations (domain-specific relations). A study of definition typologies, like the one of (Auger, 1997), shows that these different relations are also present in definitions. We can then hypothesise that mining definitions along with the detection of their inherent semantic relation can help to organise terms according to the relations used in structured terminologies. We focus in this paper on the detection of terms related by hypernymy and synonymy in definitions.
  • The automatic detection of definitions can rely on different types of existing works. We can, first, consider the studies describing what definition is, and more particularly what definition in corpus is like. In this respect, we can cite the work of Trimble (1985), Flowerdew (1992), Sager (2001) and Meyer (2001). Another type of interesting existing work is about typologies of definitions: Martin (1983), Chukwu and Thoiron (1989) and Auger (1997), amongst others, provide, in their classifications of definitions, linguistic clues to find defining statements in corpus. We propose to integrate the typologies that we mention in section 2.2, along with the linguistic clues they give: the definition markers. And, at last, some works have already focused on mining definitions from corpora, including Cartier (1997), Pearson (1996), Rebeyrolle (2000) and Muresan and Klavans (2002), mostly through the use of lexical definition markers. These works provide us with methodological guidelines and another set of lexical markers for our own experiment. As (Pearson (1996); Rebeyrolle (2000)), our method is based on lexico-syntactic patterns, so that we can build on the work on French language by Rebeyrolle (2000). We extended her work in two respects: an analysis of the parenthesis as low-level linguistic clue for definitions, and the concomitant extraction of the semantic relation involved in a “defining expression”, along with the extraction of the definition itself. Previous works have, for instance, mined definitions to find terms specific to a particular domain of knowledge (Chukwu and Thoiron (1989)), and to describe their meaning (Rebeyrolle, 2000); we focus on the detection of the semantic relations between the main terms of a definition in order to help a terminologist to build a structured terminology following these relations.
  • We implemented an interface to visualise these definitions and semantic relations extractions. We tuned markers and patterns for extracting definitions and semantic relations on a first corpus about anthropology; we then tested the validity of these markers and patterns on another corpus focused on dietetics. The purpose of this test was, on the one hand, to observe whether definitions were still correctly extracted on the basis of patterns trained on a corpus differing in the domain of knowledge and in the genre of documents involved, and, on the other hand, to detect if the semantic relation associated with each pattern was the same as the one observed in the first corpus. The markers and patterns showed to be comparable to the other experiments mentioned in terms of definition extraction: the precision reached from 61 to 66%. As for the semantic relation associated with the patterns, it obtained different scores, depending on the marker. But, in most cases, one main semantic relation is associated with a pattern in the scope of a single domain, event though a few patterns convey the same relation across our two corpora.

3.2 Lexico-syntactic patterns



References

  • A. Auger. (1997). Repérage des énoncés d’intérêt définitoire dans les bases de données textuelles. Thèse de doctorat, Université de Neuchâtel.
  • E. Cartier. (1997). La définition dans les textes scientifiques et techniques : présentation d’un outil d’extraction automatique de relations définitoires. 2e Rencontres "Terminologie et Intelligence Artificielle" (TIA’97), Equipe de Recherche en Syntaxe et Sémantique. Toulouse, 3-4 avril 1997:127–140.
  • U. Chukwu and P. Thoiron. 1989. Reformulation et repérage des termes. La Banque des Mots, Numéro spécial CTN - INaLF - CNRS:23–53.
  • J.-P. Desclés. (1996). Systèmes d’exploration contextuelle.
  • Table ronde sur le Contexte, avril 1996, Caen.
  • J. Flowerdew. (1992). Definitions in science lectures.
  • Linguistics, vol.13 (2):202–221.
  • C. Fuchs. (1994). Paraphrase et énonciation. Paris, Ophrys.
  • N. Grabar and S. Berland. (2001). Construire un corpus web pour l’acquisition terminologique.
  • 4e rencontres Terminologie et Intelligence Artificielle (TIA 2001), Nancy:44–54.
  • N. Grabar and T. Hamon. (2004). Les relations dans les terminologies structurées : de la théorie à la pratique. Revue d’Intelligence Artificielle (RIA), 18-1:57–85.
  • Marti Hearst. (1992). Automatic acquisition of hyponyms from large text corpora. 15th International Conference on Computational Linguistics (COLING 1992), Nantes:539–545.
  • R. Martin. 1983. Pour une logique du sens. Paris, PUF.
  • I. Meyer. (2001). Extracting knowledge-rich contexts for terminography. In D. Bourigault, editor, Recent advances in Computational Terminology, pages 279–302. John Benjamins Publishing Company, Philadelphia, PA.
  • S. Muresan and J. L. Klavans. (2002). A method for automatically building and evaluating dictionary resources. the language Resources and Evaluation Conference (LREC 2002), Las Palmas, Spain:231–234.
  • J. Pearson. (1996). The expression of definitions in specialised texts: a corpus-based analysis.
  • In M. Gellerstam, J. Järborg, S. G. Malmgren, K. Norén, L.Rogström, and C. Papmehl, editors, 7th International Congress on Lexicography (EURALEX’96), pages 817–824. Göteborg University, Göteborg, Sweden.
  • J. Rebeyrolle. (2000). Forme et fonction de la définition en discours. Thèse de doctorat, Université de Toulouse II - Le Mirail.
  • J. C. Sager. (2001). Essays on Definition. John Benjamins, Amsterdam.
  • L. Trimble. 1985. English for Science and Technology: A Discourse Approach. Cambridge University Press, Cambridge.

,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2004 DetectingSemanticRelsBetTermsInDefsPierre Zweigenbaum
Veronique Malaise
Bruno Bachimont
Detecting Semantic Relations Between Terms In DefinitionsInternational Workshop On Computational Terminologyhttp://www.aclweb.org/anthology-new/W/W04/W04-1807.pdf2004