- (Yan et al., 2005) ⇒ Xifeng Yan, Hong Cheng, Jiawei Han, Dong Xin. (2005). “Summarizing Itemset Patterns: a profile-based approach.” In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining. doi:10.1145/1081870.1081907
Subject Headings: Frequent Itemset Summarization Task, Frequent Itemset Mining Task, Pattern Summarization Task, Hierarchical Agglomerative Clustering Task, K-Means Clustering Task.
- Google Scholar: ~100 Citations.
- ACM DL: ~ 79 Citations
- Frequent-pattern mining has been studied extensively on scalable methods for mining various kinds of patterns including itemsets, sequences, and graphs. However, the bottleneck of frequent-pattern mining is not at the efficiency but at the interpretability, due to the huge number of patterns generated by the mining process. In this paper, we examine how to summarize an collection of itemset patterns using only K representatives, a small number of patterns that a user can handle easily. The K representatives should not only cover most of the frequent patterns but also approximate their supports. A generative model is built to extract and profile these representatives, under which the supports of the patterns can be easily recovered without consulting the original dataset. Based on the restoration error, we propose a quality measure function to determine the optimal value of parameter K. Polynomial time algorithms are developed together with several optimization heuristics for efficiency improvement. Empirical studies indicate that we can obtain compact summarization in real datasets.
1. Introduction
2. Pattern Profile
3. Pattern Summarization
3.1 Hierarchical Agglomerative Clustering
3.2 K-means Clustering
3.3 Optimization Heuristics
3.3.1 Closed Itemsets vs. Frequent Itemsets
3.3.2 Approximate Profiles
3.4 Quality Evaluation
4. Empirical Study
4.1 Real Datasets
4.2 Synthetic Datasets
5. Related Work
6. Conclusions
