GMRKB-directed Wikification Task
Jump to navigation
Jump to search
A GMRKB-directed Wikification Task is a text wikification task whose (input are text items and) output is GM-RKB content.
- Context:
- Task Input: Plaintext, Hypertext, Text Document.
- Task Output: GM-RKB Page, GM-RKB WikiText.
- It can be supported by:
- It can be solved by a GM-RKB Wikification System (that implements a GM-RKB Wikification Algorithm).
- Example(s)
- Task Input : “... cascading non-homogeneous Poisson process” ⇒ Task Output: “… [[ Cascading Stochastic Process|cascading ] ] [ [Non-Homogeneous Stochastic Process|non-homogeneous ] ] [ [Poisson Stochastic Process| Poisson process ] ]
- Task Input : “… training multi-label text classifiers.” ⇒ Task Output : “… [ [Machine Learning Training Task|training] ] [ [Multi-Label Classifier|multi-label] ] [ [Text Object|text] ] [ [classifier] ]s.”
- Counter-Example(s):
- See: Natural Language Processing Task, Wikimedia, Wikitext, WikiText Error Correction Task, Semantic Wiki, Shallow WikiText error correction System, Wikify!, Document to Ontology Interlinking System.
References
2019
- (Melli & Moreira, 2019) ⇒ Gabor Melli, and Olga Moreira (2019). "The GMRKB.com Semantic Wiki (2019)". SLKB@AKBC Accepted Abstract 9.
- QUOTE: Most of the approximately 4,500 publication abstracts and content quotes were annotated using the following two-step process: 1. the SDOI system's mention recognizer (Melli, 2012) automatically pre-annotates abstracts by applying a trained conditional random field (CRF)-based chunker; 2. the authors review each abstract to remove remaining editing errors and to add domain-specific repairs (this step takes approximately 1 minute per abstract). This concept mention interlinking is accomplished using the popular annotation format used in Wikipedia[1] and is a continuation of the work started in (Melli, 2010).
2016
- (Melli, 2016) ⇒ Gabor Melli. (2016). “Semantically Annotated Concepts in KDD's 2009-2015 Abstracts.” In: Proceedings of LangOnto2-TermiKS (LO2TKS) 2016 Workshop.
- QUOTE: We introduce a linguistic resource composed of a semantically annotated corpus and a lexicalized ontology that are interlinked on mentions of concepts and entities.
2012
- (Melli, 2012) ⇒ Gabor Melli. (2012). “Identifying Untyped Relation Mentions in a Corpus Given An Ontology.” In: Workshop Proceedings of TextGraphs-7 on Graph-based Methods for Natural Language Processing.
- QUOTE: In this paper we present the SDOIrmi text graph-based semi-supervised algorithm for the task for relation mention identification when the underlying concept mentions have already been identified and linked to an ontology. To overcome the lack of annotated data, we propose a labelling heuristic based on information extracted from the ontology.We evaluated the algorithm on the kdd09cma1 dataset using a leave-one-document-out framework and demonstrated an increase in F1 in performance over a co-occurrence based AllTrue baseline algorithm. An extrinsic evaluation of the predictions suggests a worthwhile precision on the more confidently predicted additions to the ontology.
2010a
- (Melli, 2010) ⇒ Gabor Melli. (2010). “Concept Mentions within KDD-2009 Abstracts (kdd09cma1) Linked to a KDD Ontology (kddo1).” In: Proceedings of the Seventh Conference on International Language Resources and Evaluation (LREC 2010).
- ABSTRACT: We introduce the kddo1 ontology and semantically annotated kdd09cma1 corpus from the field of knowledge discovery in database (KDD) research. The corpus is based on the abstracts for the papers accepted into the KDD-2009 conference. Each abstract has its concept mentions identified and, where possible, linked to the appropriate concept in the ontology. The ontology is based on a human generated and readable semantic wiki focused on concepts and relationships for the domain along with other related topics, papers and researchers from information sciences. To our knowledge this is the first ontology and interlinked corpus for a subdiscipline within computing science. The dataset enables the evaluation of supervised approaches to semantic annotation of documents that contain a large number of high-level concepts relative the number of named entity mentions. We plan to continue to evolve the ontology based on the discovered relations within the corpus and to extend the corpus to cover other research paper abstracts from the domain. Both resources are publicly available at http://www.gabormelli.com/Projects/kdd/data/.
2010b
- (Melli & Ester, 2010) ⇒ Gabor Melli, and Martin Ester. (2010). “Supervised Identification and Linking of Concept Mentions to a Domain-Specific Ontology.” In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management (CIKM 2010). doi:10.1145/1871437.1871712
- ABSTRACT: We propose a purely supervised learning approach to the task of identifying concept mentions within a document and of linking these mentions to their corresponding concept in a given ontology. Concept mention identification is performed with a trained CRF sequential model. Each mention is associated with a set of candidate ontology concepts, and binary training feature vectors are generated for these pairings. We formalize the feature space to expand on those those proposed in the literature, and also propose the inclusion of features derived from the training corpus. Iterative classification is proposed as a method of handling collective decisions in a supervised manner. The approach, named SCMILO, is validated against the ability to identify the concept mentions within the 139 KDD-2009 conference paper abstracts, and to link these mentions to a domain-specific ontology for the field of data mining.
2008
- (Melli & McQuinn, 2008) ⇒ Gabor Melli, and Jerre McQuinn. (2008). “Requirements Specification Using Fact-Oriented Modeling: A Case Study and Generalization.” In: Proceedings of Workshop on Object-Role Modeling (ORM 2008). doi:10.1007/978-3-540-88875-8_98
- QUOTE: Fact-oriented Modeling[1] [3], [4], [7] is an technique that assists with the conceptual modeling of an IT Solution. The approach however has not yet been fully incorporated into software requirement specification standards [8], [9], [10], [12], [13], [14], [2]. With the introduction of such standards as Structured Business Vocabulary and Rules (SBVR) [5], [7] it is now possible to consistently employ Fact-oriented Modeling in the delivery of enterprise solutions.
Fact-oriented Modeling depends upon a controlled vocabulary of Business Concepts which can be used by business and IT stakeholders to communicate in a common language, leaving little room for ambiguity. Many Microsoft legacy systems have physical data structures that do not reflect the business concepts and relationships that they support. Fact-oriented Modeling changes this paradigm by requiring that Business Concepts and the allowed actions and relationships between them are specified as Business Rules before the functional specification begins.
- QUOTE: Fact-oriented Modeling[1] [3], [4], [7] is an technique that assists with the conceptual modeling of an IT Solution. The approach however has not yet been fully incorporated into software requirement specification standards [8], [9], [10], [12], [13], [14], [2]. With the introduction of such standards as Structured Business Vocabulary and Rules (SBVR) [5], [7] it is now possible to consistently employ Fact-oriented Modeling in the delivery of enterprise solutions.
- ↑ Key terms are defined in the factmodels.com/PReM1/v060930 and factmodels.com/SRS1/v060930 repositories.