2003 EmployingTrainableStringSimilarityMetrics
Jump to navigation
Jump to search
- (Bilenko & Mooney, 2003b) ⇒ Mikhail Bilenko, Raymond Mooney. (2003). “Employing Trainable String Similarity Metrics for Information Integration.” In: Proceedings of the IJCAI-2003 Workshop on Information Integration.
Subject Headings(s): Duplicate Record Detection Algorithm.
Notes
- related to (Bilenko and Mooney, 2003a) ⇒ Mikhail Bilenko and Raymond Mooney. (2003). “Adaptive Duplicate Detection Using Learnable String Similarity Measures.” In: Proceedings of the ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2003).
Cited By
~13 http://scholar.google.com/scholar?num=50&cites=10298940080377151010
Quotes
Abstract
- The problem of identifying approximately duplicate objects in databases is an essential step for the information integration process. Most existing approaches have relied on generic or manually tuned distance metrics for estimating the similarity of potential duplicates. In this paper, we present a framework for improving duplicate detection using trainable measures of textual similarity. We propose to employ learnable text distance functions for each data field, and introduce an extended variant of learnable string edit distance based on an Expectation-Maximization (EM) training algorithm. Experimental results on a range of datasets show that this similarity metric is capable of adapting to the specific notions of similarity that are appropriate for different domains. Our overall system, MAR L IN, utilizes support vector machines to combine multiple similarity metrics, which are shown to perform better than ensembles of decisions trees, which were employed for this task in previous work.
References
- (Yankova et al., 2008) ⇒ Milena Yankova, Horacio Saggion and Hamish Cunningham. (2008). “A Framework for Identity Resolution and Merging for Multi-source Information Extraction.” In: Proceedings of LREC 2008.
- QUOTE: (Bilenko and Mooney, 2003) present a framework for duplicate detection using trainable measure of textual similarity (a learnable text distance function).
,
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
2003 EmployingTrainableStringSimilarityMetrics | Mikhail Bilenko Raymond J. Mooney | Employing Trainable String Similarity Metrics for Information Integration | http://userweb.cs.utexas.edu/~ml/papers/marlin-ijcai-wkshp-03.pdf |