2015 AnEfficientSemiSupervisedCluste
- (Yi et al., 2015) ⇒ Jinfeng Yi, Lijun Zhang, Tianbao Yang, Wei Liu, and Jun Wang. (2015). “An Efficient Semi-Supervised Clustering Algorithm with Sequential Constraints.” In: Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2015). ISBN:978-1-4503-3664-2 doi:10.1145/2783258.2783389
Subject Headings:
Notes
Cited By
- http://scholar.google.com/scholar?q=%222015%22+An+Efficient+Semi-Supervised+Clustering+Algorithm+with+Sequential+Constraints
- http://dl.acm.org/citation.cfm?id=2783258.2783389&preflayout=flat#citedby
Quotes
Author Keywords
- Algorithms; convex hull; data mining; probability simplex; semi-supervised clustering; sequential constraints
Abstract
Semi-supervised clustering leverages side information such as pairwise constraints to guide clustering procedures. Despite promising progress, existing semi-supervised clustering approaches overlook the condition of side information being generated sequentially, which is a natural setting arising in numerous real-world applications such as social network and e-commerce system analysis. Given emerged new constraints, classical semi-supervised clustering algorithms need to re-optimize their objectives over all data samples and constraints in availability, which prevents them from efficiently updating the obtained data partitions. To address this challenge, we propose an efficient dynamic semi-supervised clustering framework that casts the clustering problem into a search problem over a feasible convex set, i.e., a convex hull with its extreme points being an ensemble of m data partitions. According to the principle of ensemble clustering, the optimal partition lies in the convex hull, and can thus be uniquely represented by an m-dimensional probability simplex vector. As such, the dynamic semi-supervised clustering problem is simplified to the problem of updating a probability simplex vector subject to the newly received pairwise constraints. We then develop a computationally efficient updating procedure to update the probability simplex vector in O (m 2) time, irrespective of the data size n. Our empirical studies on several real-world benchmark datasets show that the proposed algorithm outperforms the state-of-the-art semi-supervised clustering algorithms with visible performance gain and significantly reduced running time.
References
;
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
2015 AnEfficientSemiSupervisedCluste | Wei Liu Tianbao Yang Jun Wang Jinfeng Yi Lijun Zhang | An Efficient Semi-Supervised Clustering Algorithm with Sequential Constraints | 10.1145/2783258.2783389 | 2015 |