2008 AFrameworkForIdentityResolutionAndMerging
- (Yankova et al., 2008) ⇒ Milena Yankova, Horacio Saggion, and Hamish Cunningham. (2008). “A Framework for Identity Resolution and Merging for Multi-source Information Extraction.” In: Proceedings of LREC Conference (LREC 2008).
Subject Headings: Entity Mention Normalization Task, Ontology-based Information Extraction, OntoText Lab.
Notes
- It proposes a Framework for IE-based Ontology Population (Ontology-based Information Extraction).
- It uses the term Identity Resolution for Entity Mention Normalization.
- It defines Identity Relation as:
- “the process of deciding whether an instance extracted from text refers to a known entity in the target domain (e.g. the ontology)”
- “the process of deciding if a particular fact extracted from text It can be linked to identical/similar facts in the ontology.”.
- It implements the Framework and tests it on Employment Posting Information Extraction.
- implementation makes use of:
- The extensible PROTON Ontology. (Terziev et al., 2005)
- The Knowledge Base System used is KIM System (Popov et al., 2004)
that is based on OWLIM (Kiryakov et al., 2005) and Sesame.
Cited By
Quotes
Abstract
In the context of ontology-based information extraction, identity resolution is the process of deciding whether an instance extracted from text refers to a known entity in the target domain (e.g. the ontology). We present an ontology-based framework for identity resolution which can be customised to different application domains and extraction tasks. Rules for identify resolution, which compute similarities between target and source entities based on class information and instance properties and values, can be defined for each class in the ontology. We present a case study of the application of the framework to the problem of multi-source job vacancy extraction.
References
- Niraj Aswani, Kalina Bontcheva, and Hamish Cunningham. (2006). Mining information for instance unification. In 5th International Semantic Web Conference (ISWC2006), Athens, Georgia.
- A. Bagga and B. Baldwin. (1998). Entity-based Cross- Document Coreferencing Using the Vector Space Model. In: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and the 17th International Conference on Computational Linguistics (COLING-ACL’98), pages 79–85.
- A. Bagga and A. W. Biermann. (2000). A methodology for cross-document coreference. In: Proceedings of the Fifth Joint Conference on Information Sciences (JCIS 2000), pages 207–210.
- Mikhail Bilenko and Raymond Mooney. (2003). Employing trainable string similarity metrics for information integration. In IJCAI-2003, Mexico.
- Ahmed K. Elmagarmid, Panagiotis G. Ipeirotis, and Vassilios S. Verykios. (2007). Duplicate record detection: A survey. Technical report, TKDE, January.
- Adam Funk, Diana Maynard, Horacio Saggion, and Kalina Bontcheva. (2007). Ontological integration of information extraction from multiple sources. In International Workshop on Multi-source, Multi-lingual Information Extraction and Summarisaton.
- F. Giunchiglia, P. Shvaiko, and M. Yatskevich. (2004). Smatch: an algorithm and an implementation of semantic matching. In ESWS, pages 61–75.
- Ralph Grishman. (1997). Information Extraction: Techniques and Challenges. In Information Extraction: a Multidisciplinary Approach to an Emerging Information Technology.
- Atanas Kiryakov, Damyan Ognyanov, and Dimitar Mano. (2005). Owlim a pragmatic semantic repository for owl. In SSWS 2005, WISE, USA.
- Michal C.A. Klein, Peter Mika, and Stefan Schlobach. (2007). Approximate instance unification using roughowl. In Workshop on Uncertainty Reasoning for the Semantic Web (URSW).
- G. S. Mann and David Yarowsky. (2003). Unsupervised personal name disambiguation. In W. Daelemans and M. Osborne, editors, Proceedings of the 7th Conference on Natural Language Learning (CoNLL-2003), pages 33–40. Edmonton, Canada, May.
- George A. Miller. (1994). Wordnet: a lexical database for english. In HLT ’94, USA.
- X.-H. Phan, L.-M. Nguyen, and S. Horiguchi. (2006). Personal name resolution crossover documents by a semantics-based approach. IEICE Trans. Inf. & Syst., Feb 2006.
- Borislav Popov, Atanas Kiryakov, Damyan Ognyanoff, Dimitar Manov, and Angel Kirilov. (2004). Kim - a semantic platform for information extraction and retrieval. In Journal of Natural Language Engineering. Cambridge University Press.
- H. Saggion. (2008). Experiments on semantic-based clustering for cross-document coreference. In International Joint Conference on Natural Language Processing, Hyderabad, India, January. AFNLP.
- Ivan Terziev, Atanas Kiryakov, and Dimitar Mano. (2005). Base upper-level ontology (bulo) guidance. Technical Report Deliverable 1.8.1, SEKT project, UK, July.
,
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
2008 AFrameworkForIdentityResolutionAndMerging | Hamish Cunningham Milena Yankova Horacio Saggion | A Framework for Identity Resolution and Merging for Multi-source Information Extraction | Proceedings of LREC Conference | http://www.lrec-conf.org/proceedings/lrec2008/pdf/347 paper.pdf | 2008 |