2005 DiscoveringLargeDenseSubgraphsi
- (Gibson et al., 2005) ⇒ David Gibson, Ravi Kumar, and Andrew Tomkins. (2005). “Discovering Large Dense Subgraphs in Massive Graphs.” In: Proceedings of the 31st International Conference on Very large data bases. ISBN:1-59593-154-6
Subject Headings:
Notes
Cited By
- http://scholar.google.com/scholar?q=%222005%22+Discovering+Large+Dense+Subgraphs+in+Massive+Graphs
- http://dl.acm.org/citation.cfm?id=1083592.1083676&preflayout=flat#citedby
Quotes
Abstract
We present a new algorithm for finding large, dense subgraphs in massive graphs. Our algorithm is based on a recursive application of fingerprinting via shingles, and is extremely efficient, capable of handling graphs with tens of billions of edges on a single machine with modest resources. We apply our algorithm to characterize the large, dense subgraphs of a graph showing connections between hosts on the World Wide Web; this graph contains over 50M hosts and 11B edges, gathered from 2.1B web pages. We measure the distribution of these dense subgraphs and their evolution over time. We show that more than half of these hosts participate in some dense subgraph found by the analysis. There are several hundred giant dense subgraphs of at least ten thousand hosts; two thousand dense subgraphs at least a thousand hosts; and almost 64K dense subgraphs of at least a hundred hosts. Upon examination, many of the dense subgraphs output by our algorithm are link spam, i.e., websites that attempt to manipulate search engine rankings through aggressive interlinking to simulate popular content. We therefore propose dense subgraph extraction as a useful primitive for spam detection, and discuss its incorporation into the workflow of web search engines.
References
;
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
2005 DiscoveringLargeDenseSubgraphsi | Andrew Tomkins Ravi Kumar David Gibson | Discovering Large Dense Subgraphs in Massive Graphs | 2005 |