2013 ExploitingUserClicksforAutomati
- (Bai et al., 2013) ⇒ Xiao Bai, Flavio P. Junqueira, and Srinivasan H. Sengamedu. (2013). “Exploiting User Clicks for Automatic Seed Set Generation for Entity Matching.” In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ISBN:978-1-4503-2174-7 doi:10.1145/2487575.2487662
Subject Headings:
Notes
Cited By
- http://scholar.google.com/scholar?q=%222013%22+Exploiting+User+Clicks+for+Automatic+Seed+Set+Generation+for+Entity+Matching
- http://dl.acm.org/citation.cfm?id=2487575.2487662&preflayout=flat#citedby
Quotes
Author Keywords
Abstract
Matching entities from different information sources is a very important problem in data analysis and data integration. It is, however, challenging due to the number and diversity of information sources involved, and the significant editorial efforts required to collect sufficient training data. In this paper, we present an approach that leverages user clicks during Web search to automatically generate training data for entity matching. The key insight of our approach is that Web pages clicked for a given query are likely to be about the same entity. We use random walk with restart to reduce data sparseness, rely on co-clustering to group queries and Web pages, and exploit page similarity to improve matching precision. Experimental results show that: (i) With 360K pages from 6 major travel websites, we obtain 84K matchings (of 179K pages) that refer to the same entities, with an average precision of 0.826; (ii) The quality of matching obtained from a classifier trained on the resulted seed data is promising: the performance matches that of editorial data at small size and improves with size.
References
;
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
2013 ExploitingUserClicksforAutomati | Srinivasan H. Sengamedu Xiao Bai Flavio P. Junqueira | Exploiting User Clicks for Automatic Seed Set Generation for Entity Matching | 10.1145/2487575.2487662 | 2013 |