2015 SetCoveratWebScale
- (Stergiou & Tsioutsiouliklis, 2015) ⇒ Stergios Stergiou, and Kostas Tsioutsiouliklis. (2015). “Set Cover at Web Scale.” In: Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2015). ISBN:978-1-4503-3664-2 doi:10.1145/2783258.2783315
Subject Headings:
Notes
Cited By
- http://scholar.google.com/scholar?q=%222015%22+Set+Cover+at+Web+Scale
- http://dl.acm.org/citation.cfm?id=2783258.2783315&preflayout=flat#citedby
Quotes
Author Keywords
Abstract
The classic Set Cover problem requires selecting a minimum size subset A ⊆ F from a family of finite subsets F Of U such that the elements covered by A are the ones covered by F. It naturally occurs in many settings in web search, web mining and web advertising. The greedy algorithm that iteratively selects a set in F that covers the most uncovered elements, yields an optimum (1 + U|) - approximation but is inherently sequential. In this work we give the first MapReduce Set Cover algorithm that scales to problem sizes of â¼ 1 trillion elements and runs in logp Δ iterations for a nearly optimum approximation ratio of p ln Δ, where Δ is the cardinality of the largest set in F p ln Δ
A web crawler is a system for bulk downloading of web pages. Given a set of seed URLs, the crawler downloads and extracts the hyperlinks embedded in them and schedules the crawling of the pages addressed by those hyperlinks for a subsequent iteration. While the average page out-degree is ∼ 50, the crawled corpus grows at a much smaller rate, implying a significant outlink overlap. Using our MapReduce Set Cover heuristic as a building block, we present the first large-scale seed generation algorithm that scales to â¼ 20 billion nodes and discovers new pages at a rate â¼ 4x faster than that obtained by prior art heuristics.
References
;
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
2015 SetCoveratWebScale | Kostas Tsioutsiouliklis Stergios Stergiou | Set Cover at Web Scale | 10.1145/2783258.2783315 | 2015 |