2010 TheCommunitySearchProblemandhow: Difference between revisions

Latest revision as of 07:27, 22 August 2024

(Sozio et al., 2010) ⇒ Mauro Sozio, and Aristides Gionis. (2010). “The Community-search Problem and how to plan a Successful Cocktail Party.” In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2010). doi:10.1145/1835804.1835923

Subject Headings:

Notes

Categories and Subject Descriptors: H.2.8 Database Management: Database Applications — Data mining; G.2.2 Discrete Mathematics: Graph Theory — Graph algorithms
General Terms: Algorithms, Experimentation, Theory

Cited By

Quotes

Author Keywords

Graph mining, community detection, social networks, graph algorithms

Abstract

A lot of research in graph mining has been devoted in the discovery of communities. Most of the work has focused in the scenario where communities need to be discovered with only reference to the input graph. However, for many interesting applications one is interested in finding the community formed by a given set of nodes. In this paper we study a query-dependent variant of the community-detection problem, which we call the community-search problem: given a graph [math]\displaystyle{ G }[/math], and a set of query nodes in the graph, we seek to find a subgraph of [math]\displaystyle{ G }[/math] that contains the query nodes and it is densely connected.

We motivate a measure of density based on minimum degree and distance constraints, and we develop an optimum greedy algorithm for this measure. We proceed by characterizing a class of monotone constraints and we generalize our algorithm to compute optimum solutions satisfying any set of monotone constraints. Finally we modify the greedy algorithm and we present two heuristic algorithms that find communities of size no greater than a specified upper bound. experimental evaluation on real datasets demonstrates the efficiency of the proposed algorithms and the quality of the solutions we obtain.

References

,

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2010 TheCommunitySearchProblemandhow	Mauro Sozio Aristides Gionis			The Community-search Problem and how to plan a Successful Cocktail Party		KDD-2010 Proceedings	http://research.yahoo.com/files/kdd2010.pdf	10.1145/1835804.1835923		2010

@@ Line 1: / Line 1: @@
-* ([[2010_TheCommunitySearchProblemandhow|Sozio & al, 2010]]) &rArr; [[author::Mauro Sozio]], and [[author::Aristides Gionis]]. ([[year::2010]]). "[http://research.yahoo.com/files/kdd2010.pdf The Community-search Problem and how to plan a Successful Cocktail Party]." In: [[journal::Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining]] ([[KDD 2010]]). [http://dx.doi.org/10.1145/1835804.1835923 doi:10.1145/1835804.1835923]
+* ([[2010_TheCommunitySearchProblemandhow|Sozio et al., 2010]]) ⇒ [[author::Mauro Sozio]], and [[author::Aristides Gionis]]. ([[year::2010]]). “[http://research.yahoo.com/files/kdd2010.pdf The Community-search Problem and how to plan a Successful Cocktail Party].” In: [[journal::Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining]] ([[KDD-2010]]). [http://dx.doi.org/10.1145/1835804.1835923 doi:10.1145/1835804.1835923]
 <B>Subject Headings:</B>
-==Notes==
+== Notes ==
+* <B>Categories and Subject Descriptors:</B> H.2.8 [[Database Management]]: [[Database Application]]s — [[Data mining]]; G.2.2 [[Discrete Mathematics]]: [[Graph Theory]] — [[Graph algorithm]]s
+* <B>General Terms:</B> [[Algorithm]]s, [[Experimentation]], [[Theory]]
 ==Cited By==
@@ Line 9: / Line 11: @@
 * http://portal.acm.org/citation.cfm?id=1835923&preflayout=flat#citedby
-==Quotes==
+== Quotes ==
-===Author Keywords===
-===Abstract===
+=== Author Keywords ===
+[[Graph mining]], [[community detection]], [[social network]]s, [[graph algorithm]]s
-A lot of [[research|research]] in [[graph mining|graph mining]] has been devoted in the [[discovery of community|discovery of communities]]. </s> Most of the [[research work|work]] has focused in the scenario where [[community|communities]] need to be [[discover|discovered]] with only reference to the [[input graph|input graph]]. </s> However, for many interesting [[social mining task|applications]] one is interested in [[finding the community|finding the community]] formed by a given [[set of node|set of nodes]]. </s> In [[2010_TheCommunitySearchProblemandhow|this paper]] we study a [[query-dependent|query-dependent]] [[variant|variant]] of the [[community-detection task|community-detection problem]], which we call the <i>[[community-search task|community-search problem]]</i>: given a [[graph|graph]] <math>G</math>, and a [[node set|set]] of <i>[[query node|query nodes]]</i> in the [[graph|graph]], we seek to find a [[subgraph|subgraph]] of <math>G</math> that contains the [[query node|query]] [[node set|nodes]] and it is [[densely connected graph|densely connected]]. </s> <P> We motivate a [[measure of density|measure of density]] based on [[minimum degree|minimum degree]] and [[distance constraint|distance constraints]], and we develop an [[optimal algorithm|<i>optimum</i>]] [[greedy algorithm|greedy algorithm]] for [[this measure|this measure]]. </s> We proceed by [[characterizing|characterizing]] a [[class|class]] of [[monotone constraint|<i>monotone</i> constraints]] and we generalize [[2010_TheCommunitySearchProblemandhow algorithm|our algorithm]] to [[compute|compute]] [[optimum solution|optimum solutions]] satisfying any [[set|set]] of [[monotone constraint|monotone constraints]]. </s> Finally we modify the [[greedy algorithm|greedy algorithm]] and we present two [[heuristic algorithm|heuristic algorithms]] that [[find community|find communities]] of [[size|size]] no [[greater than|greater than]] a [[specified|specified]] [[upper bound|upper bound]]. </s> Our [[experimental evaluation|experimental evaluation]] on [[real dataset|real datasets]] demonstrates the [[efficiency|efficiency]] of [[2010_TheCommunitySearchProblemandhow algorithm|the proposed algorithms]] and the [[quality of the solution|quality of the solutions]] we obtain. </s>
+=== Abstract ===
-==References==
+A lot of [[research|research]] in [[graph mining|graph mining]] has been devoted in the [[community discovery|discovery of communities]].</s> Most of the [[research work|work]] has focused in the scenario where [[community|communities]] need to be [[discover|discovered]] with only reference to the [[input graph|input graph]].</s> However, for many interesting [[social mining task|application]]s one is interested in [[finding the community|finding the community]] formed by a given [[set of node|set of nodes]].</s> In [[2010_TheCommunitySearchProblemandhow|this paper we]] study a [[query-dependent|query-dependent]] [[variant|variant]] of the [[community-detection task|community-detection problem]], which [[2010_TheCommunitySearchProblemandhow|we]] call the <i>[[community-search task|community-search problem]]</i>: given a [[graph|graph]] <math>G</math>, and a [[node set|set]] of <i>[[query node|query nodes]]</i> in the [[graph|graph]], [[2010_TheCommunitySearchProblemandhow|we]] seek to find a [[subgraph|subgraph]] of <math>G</math> that contains the [[query node|query]] [[node set|nodes]] and it is [[densely connected graph|densely connected]].</s>         <P>          [[2010_TheCommunitySearchProblemandhow|We]] motivate a [[graph nodes density measure|measure of density]] based on [[minimum degree|minimum degree]] and [[distance constraint|distance constraints]], and [[2010_TheCommunitySearchProblemandhow|we]] develop an [[optimal algorithm|<i>optimum</i>]] [[greedy algorithm|greedy algorithm]] for [[graph nodes density measure|this measure]]. </s> [[2010_TheCommunitySearchProblemandhow|We]] proceed by [[characterizing|characterizing]] a [[class|class]] of [[monotone constraint|<i>monotone</i> constraints]] and we generalize [[2010_TheCommunitySearchProblemandhow algorithm|our algorithm]] to [[compute|compute]] [[optimum solution|optimum solution]]s satisfying any [[set|set]] of [[monotone constraint|monotone constraints]].</s> Finally [[2010_TheCommunitySearchProblemandhow|we]] modify the [[greedy algorithm|greedy algorithm]] and [[2010_TheCommunitySearchProblemandhow|we]] present two [[heuristic algorithm|heuristic algorithm]]s that [[community discovery|find communities]] of [[size|size]] no [[greater than|greater than]] a [[specified|specified]] [[upper bound|upper bound]].</s> [[Our experimental evaluation|experimental evaluation]] on [[real dataset|real datasets]] demonstrates the [[efficiency|efficiency]] of [[2010_TheCommunitySearchProblemandhow algorithm|the proposed algorithm]]s and the [[quality of the solution|quality of the solution]]s [[2010_TheCommunitySearchProblemandhow|we]] obtain.</s>
+== References ==
 {{#ifanon:|
 * 1. G. Agarwal and D. Kempe. Modularity-maximizing Network Communities via Mathematical Programming. European Physics Journal B, 66(3), 2008.
-* 2. Reid Andersen, Kumar Chellapilla, Finding Dense Subgraphs with Size Bounds, Proceedings of the 6th International Workshop on Algorithms and Models for the Web-Graph, February 12-13, 2009, Barcelona, Spain [http://dx.doi.org/10.1007/978-3-540-95995-3_3 doi:10.1007/978-3-540-95995-3_3]
+* 2. Reid Andersen, [[Kumar Chellapilla]], Finding Dense Subgraphs with Size Bounds, Proceedings of the 6th International Workshop on Algorithms and Models for the Web-Graph, February 12-13, 2009, Barcelona, Spain [http://dx.doi.org/10.1007/978-3-540-95995-3_3 doi:10.1007/978-3-540-95995-3_3]
 * 3. Reid Andersen, Fan Chung, Kevin Lang, Local Graph Partitioning Using PageRank Vectors, Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science, p.475-486, October 21-24, 2006 [http://dx.doi.org/10.1109/FOCS.2006.44 doi:10.1109/FOCS.2006.44]
 * 4. Yuichi Asahiro, Kazuo Iwama, Hisao Tamaki, Takeshi Tokuyama, Greedily Finding a Dense Subgraph, Proceedings of the 5th Scandinavian Workshop on Algorithm Theory, p.136-148, July 03-05, 1996
@@ Line 35: / Line 39: @@
 * 16. M. Girvan and M. E. J. Newman. Community Structure in Social and Biological Networks. Proceedings of the National Academy of Sciences of the USA, 99(12):7821--7826, 2002.
 * 17. J. Hastad. Clique is Hard to Approximate Within N<sup>1--&#181;</sup>. Electronic Colloquium on Computational Complexity (ECCC), 4(38), 1997.
-* 18. George Karypis, Vipin Kumar, A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs, SIAM Journal on Scientific Computing, v.20 n.1, p.359-392, Aug. 1998 [http://dx.doi.org/10.1137/S1064827595287997 doi:10.1137/S1064827595287997]
+* 18. [[George Karypis]], [[Vipin Kumar]], A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs, SIAM Journal on Scientific Computing, v.20 n.1, p.359-392, Aug. 1998 [http://dx.doi.org/10.1137/S1064827595287997 doi:10.1137/S1064827595287997]
 * 19. Gjergji Kasneci, Shady Elbassuoni, Gerhard Weikum, MING: Mining Informative Entity Relationship Subgraphs, Proceeding of the 18th ACM Conference on Information and Knowledge Management, November 02-06, 2009, Hong Kong, China [http://dx.doi.org/10.1145/1645953.1646196 doi:10.1145/1645953.1646196]
 * 20. Samir Khuller, Barna Saha, On Finding Dense Subgraphs, Proceedings of the 36th International Colloquium on Automata, Languages and Programming: Part I, July 05-12, 2009, Rhodes, Greece [http://dx.doi.org/10.1007/978-3-642-02927-1_50 doi:10.1007/978-3-642-02927-1_50]
-* 21. Yehuda Koren, Stephen C. North, Chris Volinsky, Measuring and Extracting Proximity Graphs in Networks, ACM Transactions on Knowledge Discovery from Data (TKDD), v.1 n.3, p.12-es, December 2007 [http://dx.doi.org/10.1145/1297332.1297336 doi:10.1145/1297332.1297336]
+* 21. Yehuda Koren, Stephen C. North, [[Chris Volinsky]], Measuring and Extracting Proximity Graphs in Networks, ACM Transactions on Knowledge Discovery from Data (TKDD), v.1 n.3, p.12-es, December 2007 [http://dx.doi.org/10.1145/1297332.1297336 doi:10.1145/1297332.1297336]
 * 22. Bernhard Korte, Jens Vygen, Combinatorial Optimization: Theory and Algorithms, Springer Publishing Company, Incorporated, 2007
 * 23. L. Kou, G. Markowsky, and L. Berman. A Fast Algorithm for Steiner Trees. Acta Informatica, 15(2):141--145, 1981
 * 24. Theodoros Lappas, Kun Liu, Evimaria Terzi, Finding a Team of Experts in Social Networks, Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, June 28-July 01, 2009, Paris, France [http://dx.doi.org/10.1145/1557019.1557074 doi:10.1145/1557019.1557074]
-* 25. Jure Leskovec, Kevin J. Lang, Anirban Dasgupta, Michael W. Mahoney, Statistical Properties of Community Structure in Large Social and Information Networks, Proceeding of the 17th International Conference on World Wide Web, April 21-25, 2008, Beijing, China [http://dx.doi.org/10.1145/1367497.1367591 doi:10.1145/1367497.1367591]
+* 25. [[Jure Leskovec]], Kevin J. Lang, Anirban Dasgupta, Michael W. Mahoney, Statistical Properties of Community Structure in Large Social and Information Networks, Proceeding of the 17th International Conference on World Wide Web, April 21-25, 2008, Beijing, China [http://dx.doi.org/10.1145/1367497.1367591 doi:10.1145/1367497.1367591]
 * 26. M. Newman. Fast Algorithm for Detecting Community Structure in Networks. Physical Review E, 69, 2003.
 * 27. P. Sevon, L. Eronen, P. Hintsanen, K. Kulovesi, and H. Toivonen. Link Discovery in Graphs Derived from Biological Databases. In DILS, 2006.
@@ Line 64: / Line 68: @@
 | ?year
 | format=bibtex
-}}{{Publication|doi=10.1145/1835804.1835923|title=The Community-search Problem and how to plan a Successful Cocktail Party|titleUrl=http://research.yahoo.com/files/kdd2010.pdf|abstract=A lot of research in graph mining has been devoted in the discovery of communities. Most of the work has focused in the scenario where communities need to be discovered with only reference to the input graph. However, for many interesting applications one is interested in finding the community formed by a given set of nodes. In this paper we study a query-dependent variant of the community-detection problem, which we call the <i>community-search problem</i>: given a graph <i>G</i>, and a set of <i>query nodes</i> in the graph, we seek to find a subgraph of <i>G</i> that contains the query nodes and it is densely connected.</p> <p>We motivate a measure of density based on minimum degree and distance constraints, and we develop an <i>optimum greedy</i> algorithm for this measure. We proceed by characterizing a class of <i>monotone</i> constraints and we generalize our algorithm to compute optimum solutions satisfying any set of monotone constraints. Finally we modify the greedy algorithm and we present two heuristic algorithms that find communities of size no greater than a specified upper bound. Our experimental evaluation on real datasets demonstrates the efficiency of the proposed algorithms and the quality of the solutions we obtain.}}
+}}{{Publication|doi=10.1145/1835804.1835923|title=The Community-search Problem and how to plan a Successful Cocktail Party|titleUrl=http://research.yahoo.com/files/kdd2010.pdf|abstract=A lot of research in graph mining has been devoted in the discovery of communities. Most of the work has focused in the scenario where communities need to be discovered with only reference to the input graph. However, for many interesting applications one is interested in finding the community formed by a given set of nodes. [[In this paper we]] study a query-dependent variant of the community-detection problem, which we call the <i>community-search problem</i>: given a graph <i>G</i>, and a set of <i>query nodes</i> in the graph, [[we]] seek to find a subgraph of <i>G</i> that contains the query nodes and it is densely connected.</p> <p>We motivate a measure of density based on minimum degree and distance constraints, and [[we]] develop an <i>optimum greedy</i> algorithm for this measure. [[We]] proceed by characterizing a class of <i>monotone</i> constraints and we generalize [[our algorithm]] to compute optimum solutions satisfying any set of monotone constraints. Finally [[we]] modify the greedy algorithm and we present two heuristic algorithms that find communities of size no greater than a specified [[upper bound]]. Our experimental evaluation on real datasets demonstrates the efficiency of the proposed algorithms and the quality of the solutions we obtain.}}