SDOI System
An SDOI System is a supervised document to ontology interliking system (that can solve the Document to Ontology Interlinking Task) implemented as part of the SDOI project (by Gabor Melli).
- AKA: SDOI.
- Context:
- It implements:
- It can be supported by a SDOI Annotation System which implements a SDOI Algorithm.
- Example(s):
- Counter-Example(s):
- a CYC System.
- See: KDD-2009 Abstracts Analysis, RKB Research Project, Term Mention Recognition, Term Mention Linking, Ontology, Research Paper, Data Mining Discipline, Concept Mention Identification and Linking Task, Supervised Learning Task, Supervised Sequential Classifier, Sequential Tagger, kddo Ontology, kdd09cma1 Corpus.
References
2010a
- (Melli, 2010b) ⇒ Gabor Melli. (2010). “Supervised Ontology to Document Interlinking..” Ph.D. Thesis, Simon Fraser University.
- QUOTE: (...) Our proposed supervised solution, named SDOI, follows this three-fold decomposition: SDOICMI, SDOICML, and SDOIRMI.
SDOICMI first trains a supervised sequential classifier to identify token subsequences in a document as concept mentions. Motivations for the application of a sequential tagger include their successful use in the NLP community to the related tasks of text-chunking (Sha & Pereira, 2003) and named entity recognition (McCallum & Li, 2003), and the possibility that any future improvements in the use and training of sequential taggers in other domains can be naturally imported into our framework. A further motivation of this sequential tagging approach is that it identifies lexically varied concept mention even when the token sequence is not present in the training corpus, nor recorded as a possible alternate spelling within the ontology.
Next, due to the large number of concepts in the ontology, we propose that the SDOICMI module apply a binary supervised classifier to the concept mention linking task rather than to directly train a multi-class classifier (Rifkin & Klatau, 2004). To accomplish this transformation, each mention is associated with a subset of candidate concepts by means of heuristic candidacy tests that can be used to remove cases that are very unlikely to be true (i.e. to undersample). Next, each candidate concept is associated with a rich feature vector, including recursively defined (collective) features that account for global context, and then labelled as true or false based on whether the concept is indeed the one that the mention must link to. In order to support the collective features we propose the use of an iterative supervised classifier (Neville & Jensen, 2000).
The final module of the pipeline, SDOIRMI, is another binary classifier for solving the relation mention identification task: SDOIRMI. For each permutation of two concept mentions, we build a feature vector and heuristically associate a label based on whether the relation is present in the ontology. A difference for this subtask is that we do not require that a person manually label each of the multitude of concept mention combinations within each document. Instead, we propose the use of a self-supervised approach that makes use of a labelling heuristic (Banko & Etzioni, 2008). The proposed labelling heuristic is to assign a label if the candidate mention refers to a link that exists or does not exist in the ontology.
- QUOTE: (...) Our proposed supervised solution, named SDOI, follows this three-fold decomposition: SDOICMI, SDOICML, and SDOIRMI.
2012b
- (Melli & Ester, 2010) ⇒ Gabor Melli, and Martin Ester. (2010). “Supervised Identification and Linking of Concept Mentions to a Domain-Specific Ontology.” In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management (CIKM 2010). doi:10.1145/1871437.1871712
- QUOTE: We propose a supervised learning approach, SDOI, to the task of identifying concept mentions within a document and of linking these mentions to their corresponding concept node, if it exists, in a domain-specific ontology. Concept mention identification is performed with a trained sequential tagging model. Each identified mention is then associated with a set of candidate ontology concepts along with their feature vectors. We formalize feature spaces proposed in the literature and expand it into new data sources, such as from the training corpus itself. An iterative algorithm is defined for handling collective features which assume that some of the labels are known in advance. The approach is validated against the ability to identify the concept mentions within the 139 KDD-2009 conference paper abstracts, and to link these mentions to a domain-specific ontology for the field of data mining. We show a lift in over existing approaches applicable to the task. Additional experiments on a separate corpus from the same domain suggest that the trained models are portable both in terms of accuracy and in their ability to reduce annotation time.