Top-Down Clustering Algorithm
A Top-Down Clustering Algorithm is a clustering algorithm that decomposes a data set into a set of disjoint clusters.
- AKA: Partitioning Relocation Clustering Algorithm.
- Context:
- It can be implemented in a Partitional Clustering System (that solves a partitional clustering task).
- Example(s):
- Counter-Example(s):
- See: Partition Relation, Disjoint Subsets, Categorical Data Clustering.
References
2019a
- (Johnson, 2019) ⇒ Reid Johnson (2019) "Hierarchical Clustering" Retrieved: 2017-05-17.
- QUOTE: Hierarchical vs. Partitional Clustering
- A distinction among different types of clusterings is whether the set of clusters is nested or unnested.
- A partitional clustering a simply a division of the set of data objects into non-overlapping subsets (clusters) such that each data object is in exactly one subset.
- A hierarchical clustering is a set of nested clusters that are organized as a tree
- QUOTE: Hierarchical vs. Partitional Clustering
2019b
- (Gao, 2019) ⇒ Jing Gao. (2019) "Clustering Lecture 2: Partitional Methods" Retrieved: 2017-05-17.
2017
- (Jin & Han, 2017) ⇒ Xin Jin, Jiawei Han. (2017). “Partitional Clustering”. In: (Sammut & Webb, 2017) DOI: 10.1007/978-1-4899-7687-1_637
- QUOTE: Partitional clustering is a type of clustering algorithms that divide a set of data points into disjoint subsets. Each data point is in exactly one subset (...)
Partitional clustering (Han et al. 2011) decomposes a data set into a set of disjoint clusters. Given a data set of [math]\displaystyle{ N }[/math] points, a partitioning method constructs [math]\displaystyle{ K (N \geq K) }[/math] partitions of the data with each partition representing a cluster. That is, it classifies the data into [math]\displaystyle{ K }[/math] groups by satisfying the following requirements: (1) each group contains at least one point, and (2) each point belongs to exactly one group. For fuzzy partitioning, a point can belong to more than one group. The quality of the solution is measured by clustering criteria.
- QUOTE: Partitional clustering is a type of clustering algorithms that divide a set of data points into disjoint subsets. Each data point is in exactly one subset (...)
2016
- (J & Pandya, 2016) ⇒ Swarndeep Saket J, and Dr Sharnil Pandya. (2016). “An Overview of Partitioning Algorithms in Clustering Techniques.” In: International Journal of Advanced Research in Computer Engineering & Technology (IJARCET), 5(6). ISSN:2278 -1323
- QUOTE: Given a database of n objects, it constructs k partitions of the data. Each object must belong to exactly one group. Each group must contain at least one object. Partition technique can improve iterative relocation technique by mining objects from one graph to another. The main objective of partition clustering algorithm is to divide the data points into K partitions. Each partition will reflect one cluster. The technique of partition depends upon certain objective functions. For example ‘minimizing the square error criterion’. The weakness of such an algorithm is that whenever the distance between the two points from the center are close to another cluster, the result becomes poor or misleading due overlapping of the data points.
2014
- (Celebi, 2014) ⇒ M. Emre Celebi. (2014). "Partitional Clustering Algorithms"; Springer Publishing Company, Incorporated. ISBN:3319092588, 9783319092584
- QUOTE: During the last four decades, many researchers have been focused in designing clustering methods resulting in many methods that are proposed in the literature which are based on different approaches. Partitional clustering [29] also referred as partitioning relocation clustering [7] constitutes an important approach that several clustering methods are based on. Partitional clustering attempts to directly decompose the data set into a set of disjoint clusters leading to an integer number of clusters that optimizes a given criterion function. The criterion function may emphasize a local or a global structure of the data and its optimization is an iterative relocation procedure. Such type of clustering methods considers that clusters are disjoint and does not support interactions. However, for many applications of clustering, it would be recommended to tolerate overlaps between clusters to better fit hidden structures in the observed data. This research's issue is referred to as overlapping clustering.
2012
- (Han et al., 2012) ⇒ Jiawei Han, Micheline Kamber, and Jian Pei. (2012). “Data Mining: Concepts and Techniques (3rd Edition).” ISBN:978-0-12-381479-1 doi:10.1016/C2009-0-61819-5
2010
- (Xiao & Xiao, 2010) ⇒ Jing-zhong Xiao, and Li Xiao. (2010). "A Research of the Partition Clustering Algorithm".; In: Proceedings of the 2010 International Symposium on Intelligence Information Processing and Trusted Computing. ISBN:978-0-7695-4196-9 doi:10.1109/IPTC.2010.148
2006
- (Davidson et al., 2006) ⇒ Ian Davidson, Kiri L. Wagstaff, and Sugato Basu. (2006). “Measuring Constraint-Set Utility for Partitional Clustering Algorithms.” In: Proceedings of European Conference on Principles of Data Mining and Knowledge Discovery. Knowledge Discovery in Databases: PKDD 2006. ISBN:978-3-540-45374-1 doi:10.1007/11871637_15
2002
- (Zhao & Karypis, 2002) ⇒ Ying Zhao, and George Karypis. (2002). “Comparison of Agglomerative and Partitional Document Clustering Algorithms.”
- QUOTE: Partitional clustering algorithms compute a k-way clustering of a set of documents either directly or via a sequence of repeated bisections. A direct k-way clustering is commonly computed as follows. Initially, a set of [math]\displaystyle{ \kappa }[/math] documents is selected from the collection to act as the seeds of the [math]\displaystyle{ \kappa }[/math] clusters. Then, for each document, its similarity to these [math]\displaystyle{ \kappa }[/math] seeds is computed, and it is assigned to the cluster corresponding to its most similar seed. This forms the initial k-way clustering. This clustering is then repeatedly refined so that it optimizes the desired clustering criterion function. A k-way partitioning via repeated bisections is obtained by recursively applying the above algorithm to compute 2-way clustering (i.e., bisections). Initially, the documents are partitioned into two clusters, then one of these clusters is selected and is further bisected, and so on. This process continues [math]\displaystyle{ \kappa -1 }[/math] times, leading to [math]\displaystyle{ \kappa }[/math] clusters. Each of these bisections is performed so that the resulting two-way clustering solution optimizes the particular criterion function. However, the overall k-way clustering solution will not necessarily be at local minima with respect to the criterion function. The key step in this algorithm is the method used to select which cluster to bisect next. In all of our experiments, we chose to select the largest cluster, as this approach lead to reasonably good and balanced clustering solutions [23]. Extensive experiments presented in [26], show that the clustering solutions obtained via repeated bisections are comparable or better than those produced via direct clustering. Furthermore, their computational requirements are much smaller, as they have to solve a simpler optimization problem at each step. For this reason, in all of our experiments, we use this approach to compute partitional clustering solutions.