Definition Extraction (DE) System
A Definition Extraction (DE) System is an information extraction system that implements a definition extraction algorithm to solve an definition extraction task.
- AKA: Definition Detection System.
- Context:
- It can range from (typically) being a Definitional Sentence Extraction System to being a Definitional Paragraph Extraction System.
- It can range from being a Rule-based Definition Extraction System, to being a ML-based Definition Extraction System, to being a ANN-based Definition Extraction System.
- …
- Example(s):
- a DefExt System (Espinosa-Anke et al., 2016),
- a Pattern Matching Definition Extraction System (e.g. Westerhout, 2009),
- a Semi-Structured Text Definition Extraction System (e.g. Curtotti et al., 2013),
- a Word-Class Lattices (WCLs) Definition Extraction System (Navigli & Velardi, 2010),
- a DefExplorer System (Leu & Ko, 2010).
- an ECODE System (Alarcon et al., 2009),
- an Evolutionary Definition Extraction System (e.g. Borg et al., 2009),
- an Indexed Reference Identification (IRI) Definition Extraction System (Bertin et al., 2009).
- a GlossExtractor DE System (Navigli & Velardi, 2007).
- …
- Counter-Example(s):
- See: Definitional Sentence Generation System, Automated Definitional Sentence Extraction Task, Bootstrapping Algorithm, Automatic Glossary Generation System, Taxonomy Learning System, Question-Answering System, Semantic Search System.
References
2019
- (Spala et al., 2019) ⇒ Sasha Spala, Nicholas A. Miller, Yiming Yang, Franck Dernoncourt, and Carl Dockhorn. (2019). “DEFT: A Corpus for Definition Extraction in Free- and Semi-structured Text.” In: Proceedings of the 13th Linguistic Annotation Workshop.
- QUOTE: Definition extraction has been a popular topic in NLP research for well more than a decade, but has been historically limited to well-defined, structured, and narrow conditions. In reality, natural language is messy, and messy data requires both complex solutions and data that reflects that reality. In this paper, we present a robust English corpus and annotation schema that allows us to explore the less straightforward examples of term-definition structures in free and semi-structured text. …
2018
- (Anke & Schockaert, 2018) ⇒ Luis Espinosa Anke, and Steven Schockaert. (2018). “Syntactically Aware Neural Architectures for Definition Extraction". In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2018) Volume 2 (Short Papers).
- QUOTE: Automatically identifying definitional knowledge in text corpora (Definition Extraction or DE) is an important task with direct applications in, among others, Automatic Glossary Generation, Taxonomy Learning, Question Answering and Semantic Search. It is generally cast as a binary classification problem between definitional and non-definitional sentences. In this paper, we present a set of neural architectures combining Convolutional and Recurrent Neural Networks, which are further enriched by incorporating linguistic information via syntactic dependencies.
2016
- (Espinosa-Anke et al., 2016) ⇒ Luis Espinosa-Anke, Roberto Carlini, Horacio Saggion, and Francesco Ronzano. (2016). “DEFEXT: A Semi Supervised Definition Extraction Tool.” In: GLOBALEX 2016 Lexicographic Resources for Human Language Technology Workshop Programme.
- QUOTE: Definition Extraction (DE), i.e. the task to automatically extract definitions from naturally occurring text, can be approached by exploiting lexico-syntactic patterns (...), in a supervised machine learning setting (...), or leveraging bootstrapping algorithms (...).
2013a
- (Boella & Di Caro, 2013) ⇒ Guido Boella, and Luigi Di Caro. (2013). “Extracting Definitions and Hypernym Relations Relying on Syntactic Dependencies and Support Vector Machines.” In: Proceedings of the 51st annual meeting of the association for computational linguistics (ACL-2013).
- QUOTE: As for the task of definition extraction, most of the existing approaches use symbolic methods that are based on lexico-syntactic patterns, which are manually crafted or deduced automatically. The seminal work of (Hearst, 1992) represents the main approach based on fixed patterns like "$NP_x$ is a/an $NP_y$" and "$NP_x$ such as $NP_y$", that usually imply $< x IS-A y >$.
2013b
- (Curtotti et al., 2013) ⇒ Michael Curtotti, Eric McCreath, and Srinivas Sridharan (2013). "Software Tools for the Visualization of Definition Networks in Legal Contracts". In: Proceedings of the 14th International Conference on Artificial Intelligence and the Law (ICAIL 2013).
- QUOTE: Work on the application of natural language processing to definitions in general text is extensive, however a considerable part of this work is dedicated to extraction of definitions from unstructured general prose. It thus addresses a more complex and difficult problem than that of extraction of definitions from semi-structured texts, such as contracts.
2010a
- (Navigli et al., 2010) ⇒ Roberto Navigli, Paola Velardi, and Juana Maria Ruiz-Martınez. (2010). “An Annotated Dataset for Extracting Definitions and Hypernyms from the Web..” In: Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC-2010).
2010b
- (Navigli & Velardi, 2010) ⇒ Roberto Navigli, and Paola Velardi. (2010). “Learning Word-class Lattices for Definition and Hypernym Extraction.” In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL-2010).
- QUOTE: Definition extraction is the task of automatically identifying definitional sentences within texts ... (...)
Much of the current literature focuses on the use of lexico-syntactic patterns, inspired by Hearst’s (1992) seminal work. However, these methods suffer both from low recall and precision, as definitional sentences occur in highly variable syntactic structures, and because the most frequent definitional pattern – $X $is a $Y$ – is inherently very noisy.
In this paper, we propose a generalized form of word lattices, called Word-Class Lattices (WCLs), as an alternative to lexico-syntactic pattern learning. A lattice is a directed acyclic graph (DAG), a subclass of non-deterministic finite state automata (NFA). The lattice structure has the purpose of preserving the salient differences among distinct sequences, while eliminating redundant information(...)
- QUOTE: Definition extraction is the task of automatically identifying definitional sentences within texts ...
2010c
- (Leu & Ko, 2010) ⇒ Fang-Yie Leu, and Chih-Chieh Ko (2010). "An Automated Term Definition Extraction System Using the Web Corpus in the Chinese Language". In: Journal Of Information Science And Engineering 26, 505-525 (2010).
- QUOTE: DefExplorer extracts definitions or their equivalences for Chinese terms in six phases, as shown in Fig. 1, including question analysis, document retrieval, semantics selection, similarity scoring, candidate grouping (also called candidate clustering), and answer generation. The first two phases respectively retrieve a given term's corresponding patterns, and submit the patterns to search results. The third phase removes semantically inappropriate search results sentences, and identifies the key portion of a definition sentence. In the fourth and fifth phases, DefExplorer calculates similarities between each sentence and other definition sentences, and clusters semantically similar sentences into a group. The last phase selects top-ranked sentences as the final results. In the following, we will describe the six phases, and explain why they are employed.
- QUOTE: DefExplorer extracts definitions or their equivalences for Chinese terms in six phases, as shown in Fig. 1, including question analysis, document retrieval, semantics selection, similarity scoring, candidate grouping (also called candidate clustering), and answer generation. The first two phases respectively retrieve a given term's corresponding patterns, and submit the patterns to search results. The third phase removes semantically inappropriate search results sentences, and identifies the key portion of a definition sentence. In the fourth and fifth phases, DefExplorer calculates similarities between each sentence and other definition sentences, and clusters semantically similar sentences into a group. The last phase selects top-ranked sentences as the final results. In the following, we will describe the six phases, and explain why they are employed.
2009a
- (Alarcon et al., 2009) ⇒ Rodrigo Alarcon, Gerardo Sierra, and Carme Bach (2009)."Description and Evaluation of a Definition Extraction System for Spanish Language". In: Proceeding of the 1st Workshop On Definition Extraction (WDE 2009).
- QUOTE: Therefore, we propose a methodology that includes not only the extraction of occurrences of definitional patterns, but also a filtering process of non-relevant contexts (i.e. non definitional contexts), the automatic identification of the possible constitutive elements of a DC: terms and definitions, and a final automatic ranking of the results. This system is called ECODE: extractor de contextos definitorios (definitional contexts extractor).
2009b
- (Bertin et al., 2009) ⇒ Marc Bertin, Iana Atanassova, and Jean-Pierre Descles (2009). "Extraction of Author's Definitions Using Indexed Reference Identification". In: Proceeding of the 1st Workshop On Definition Extraction (WDE 2009).
- QUOTE: In this paper we explore a new way to extract definitions from scientific text corpora by establishing a relation between the usage of a definition and a cited author ... (...)
... More precisely, the indexed references allow us, in the case when we identify a definition in the research scope determined by the segmentation, to link this definition to the author cited in the text. The theoretical framework as well as the experimental procedures for the indexed reference identification are described below.
- QUOTE: In this paper we explore a new way to extract definitions from scientific text corpora by establishing a relation between the usage of a definition and a cited author ...
2009c
- (Borg et al., 2009) ⇒ Claudia Borg, Mike Rosner, and Gordon Pace (2009). "Evolutionary Algorithms for Definition Extraction". In: Proceeding of the 1st Workshop On Definition Extraction (WDE 2009).
- QUOTE: In this paper, we explore the use of machine learning techniques, in particular evolutionary algorithms, to enable the learning of sentence classifiers, separating definitions from non-definitions.
2009d
- (Westerhout, 2009) ⇒ Eline Westerhout (2009). "Definition Extraction using Linguistic and Structural Features". In: Proceeding of the 1st Workshop On Definition Extraction (WDE 2009).
- QUOTE: Different approaches for the extraction of definitions can be distinguished. We use a sequential combination of a rule-based approach and machine learning to extract them. As a first step a grammar is used to match sentences with a definition pattern and thereafter, machine learning techniques are applied to filter out those sentences that – although they have a definition pattern – do not qualify as definitions.