2000 GraphClusterByFlowSimulation
- (van Dongen, 2000) ⇒ Stijn Marinus van Dongen. (2000). “Graph Clustering by Flow Simulation.” PhD thesis, University of Utrecht.
Subject Headings: Cluster Analysis, Markov Clustering.
Notes
Cited By
Quotes
Introduction
Cluster Analysis is the mathematical study of methods for recognizing natural groups within a class of entities. This is a common type of definition for the discipline of cluster analysis, and it is unsatisfactory in at least one respect. However, this cause for dissatisfaction must be welcomed, since it points directly to some of the essential questions and problems in cluster analysis. These essential aspects will be briefly discussed in this introduction, followed by an account of the main contribution of the thesis, the Markov Cluster Process, a cluster process designed within the setting of graphs. This setting defines a relatively young area in cluster analysis referred to as graph clustering, which has connections to the clearly scoped field of graph partitioning. Clustering and graph-clustering methods are also studied in the large research area labelled pattern recognition. These disciplines and the applications studied therein form the natural habitat for the Markov Cluster Algorithm. Their relative positions are sketched, anticipating a more thorough topography in the first part of this thesis. The Markov Cluster Process (abbreviated MCL process) defines a sequence of stochastic matrices by alternation of two operators on a generating matrix. It is basically all that is needed for the clustering of graphs, but it is useful to distinguish between the algorithm and the algebraic process employed by the algorithm. The study of mathematical properties thus belongs to the process proper, and aspects such as interpretation and scaling belong to the algorithm or a particular implementation of the algorithm. The second part of this thesis covers the conceptual and theoretical foundation of the Markov Cluster (abbreviated MCL)Algorithm and Process. Other proposals for graph clustering are discussed, and several classic mathematical results are presented which are relevant in the broad perspective of graph clustering and graph partitioning, in particular the relationship between graph spectra and connectivity properties. The third part of the thesis is concerned with algorithmic issues such as complexity, scalability, and implementation, and various types of experiments and benchmarks establishing the particular strengths and weaknesses of the algorithm. The thesis aims at an audience with a mathematical background. A marked exception is the appendix A cluster miscellany on page 149, which gives a bird's eye view of various aspects of cluster analysis. Among them are the history of the discipline, the role of the computer, and the etymology of some of the words naming concepts central to cluster analysis and the Markov Cluster Algorithm. The organization of this introduction largely mirrors the structure of the remainder of the thesis, consisting of the three parts Cluster Analysis and Graph Clustering, The Markov Cluster Process, and Markov Cluster Experiments respectively. The introduction concludes with a detailed account of the structure and contents of the thesis.