Subspace Clustering Algorithm
A Subspace Clustering Algorithm is a Clustering Algorithm that detects all clusters in all subspaces
- Counter-Example(s):
- See: High-Dimensional Data Clustering, Association Rule Mining, Subspace, Soft Clustering.
References
2014
- (Wikipedia, 2014) ⇒ http://en.wikipedia.org/wiki/Clustering_high-dimensional_data#Subspace_clustering Retrieved:2014-7-27.
- Subspace clustering is the task of detecting all clusters in all subspaces. This means that a point might be a member of multiple clusters, each existing in a different subspace. Subspaces can either be axis-parallel or affine. The term is often used synonymous with general clustering in high-dimensional data.
The image on the right shows a mere two-dimensional space where a number of clusters can be identified. In the one-dimensional subspaces, the clusters [math]\displaystyle{ c_a }[/math] (in subspace [math]\displaystyle{ \{x\} }[/math]) and [math]\displaystyle{ c_b }[/math], [math]\displaystyle{ c_c }[/math], [math]\displaystyle{ c_d }[/math] (in subspace [math]\displaystyle{ \{y\} }[/math]) can be found. [math]\displaystyle{ c_c }[/math] cannot be considered a cluster in a two-dimensional (sub-)space, since it is too sparsely distributed in the [math]\displaystyle{ x }[/math] axis. In two dimensions, the two clusters [math]\displaystyle{ c_{ab} }[/math] and [math]\displaystyle{ c_{ad} }[/math] can be identified.
The problem of subspace clustering is given by the fact that there are [math]\displaystyle{ 2^d }[/math] different subspaces of a space with [math]\displaystyle{ d }[/math] dimensions. If the subspaces are not axis-parallel, an infinite number of subspaces is possible. Hence, subspace clustering algorithm utilize some kind of heuristic to remain computationally feasible, at the risk of producing inferior results. For example, the downward-closure property (cf. association rules) can be used to build higher-dimensional subspaces only by combining lower-dimensional ones, as any subspace T containing a cluster, will result in a full space S also to contain that cluster (i.e. S ⊆ T), an approach taken by most of the traditional algorithms such as CLIQUE , SUBCLU . It is also possible to define a subspace using different degrees of relevance for each dimension, an approach taken by iMWK-Means .
- Subspace clustering is the task of detecting all clusters in all subspaces. This means that a point might be a member of multiple clusters, each existing in a different subspace. Subspaces can either be axis-parallel or affine. The term is often used synonymous with general clustering in high-dimensional data.
- (Kriegel & Ntoutsi, 2014) ⇒ Hans-Peter Kriegel, and Eirini Ntoutsi. (2014). “Clustering High Dimensional Data: Examining Differences and Commonalities Between Subspace Clustering and Text Clustering - a Position Paper.” In: ACM SIGKDD Explorations Newsletter Journal, 15(2). doi:10.1145/2641190.2641192
2002
- (Beil et al., 2002) ⇒ Florian Beil, Martin Ester, and Xiaowei Xu. (2002). “Frequent Term-based Text Clustering.” In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2002). doi:10.1145/775047.775110
1999
- (Agrawal et al., 1999) ⇒ Rakesh Agrawal, Johannes Ernst Gehrke, Dimitrios Gunopulos, Prabhakar Raghavan. (1999). “Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications." US Patent 6,003,029,
- … The requirement for high dimensionality in a data mining application is conventionally addressed by requiring a user to specify the subspace for cluster analysis.