2010 DocumentClusteringviaDirichletP

From GM-RKB
Jump to navigation Jump to search

Subject Headings:

Notes

Cited By

Quotes

Author Keywords

Abstract

One essential issue of document clustering is to estimate the appropriate number of clusters for a document collection to which documents should be partitioned. In this paper, we propose a novel approach, namely DPMFS, to address this issue. The proposed approach is designed 1) to group documents into a set of clusters while the number of document clusters is determined by the Dirichlet process mixture model automatically; 2) to identify the discriminative words and separate them from irrelevant noise words via stochastic search variable selection technique. We explore the performance of our proposed approach on both a synthetic dataset and several realistic document datasets. The comparison between our proposed approach and state-of-the-art document clustering approaches indicates that our approach is robust and effective for document clustering.

References

,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2010 DocumentClusteringviaDirichletPGuan Yu
Ruizhang Huang
Zhaojun Wang
Document Clustering via Dirichlet Process Mixture Model with Feature SelectionKDD-2010 Proceedings10.1145/1835804.18359012010