2005 AdaptiveProductNormalization
- (Bilenko et al., 2005) ⇒ Mikhail Bilenko, Sugato Basu, Mehran Sahami. (2005). “Adaptive Product Normalization: Using Online Learning for Record Linkage in Comparison Shopping.” In: Proceedings of the 5th IEEE International Conference on Data Mining (ICDM-2005).
Subject Headings: Record Coreference Resolution Task, Record Coreference Resolution Algorithm, Supervised Algorithm, Learnable Similarity Function, Product Record.
Notes
- It proposes a Record Deduplication Algorithm.
- It focuses on disambiguating Product Records (which they refer to as Product Normalization).
- It proposes an clustering algorithm for clustering merchant offers.
- It assumes that offers have structured information.
Cited By
Quotes
Abstract
The problem of record linkage focuses on determining whether two object descriptions refer to the same underlying entity. Addressing this problem effectively has many practical applications, e.g., elimination of duplicate records in databases and citation matching for scholarly articles. In this paper, we consider a new domain where the record linkage problem is manifested: Internet comparison shopping. We address the resulting linkage setting that requires learning a similarity function between record pairs from streaming data. The learned similarity function is subsequently used in clustering to determine which records are co-referent and should be linked. We present an online machine learning method for addressing this problem, where a composite similarity function based on a linear combination of basis functions is learned incrementally. We illustrate the efficacy of this approach on several real-world datasets from an Internet comparison shopping site, and show that our method is able to effectively learn various distance functions for product data with differing characteristics. We also provide experimental results that show the importance of considering multiple performance measures in record linkage evaluation.
1 Introduction
As we show in this paper, record linkage is a key component of on-line comparison shopping systems. When many different web sites sell the same product, they provide different textual descriptions of the product (which we refer to as “offers”). Thus, a comparison shopping engine is faced with the task of determining which offers are referring to the same true underlying product. Solving this product normalization problem allows the shopping engine to display multiple offers for the same product to a user who is trying to determine from which vendor to purchase the product. Accurate product normalization is also critical for data mining tasks such as analysis of pricing trends.
References
,
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
2005 AdaptiveProductNormalization | Sugato Basu Mikhail Bilenko Mehran Sahami | Adaptive Product Normalization: Using Online Learning for Record Linkage in Comparison Shopping | http://research.microsoft.com/en-us/um/people/mbilenko/papers/05-icdm.pdf |