1999 ProbabilisticLatentSemanticIndexing
- (Hofmann, 1999a) ⇒ Thomas Hofmann. (1999). “Probabilistic Latent Semantic Indexing.” In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1999) doi:10.1145/312624.312649
Subject Headings: Probabilistic LSI Model, Probabilistic Generative Model, Latent Semantic Analysis, Topic Modeling.
Notes
Cited By
2003
- (Blei, Ng & Jordan, 2003) ⇒ David M. Blei, Andrew Y. Ng , and Michael I. Jordan. (2003). “Latent Dirichlet Allocation.” In: The Journal of Machine Learning Research, 3.
- (Hotho et al., 2003) ⇒ Andreas Hotho, Steffen Staab, and Gerd Stumme. (2003). “Wordnet Improves Text Document Clustering.” In: Proceedings of the SIGIR Workshop on Semantic Web Workshop.
Quotes
Abstract
Probabilistic Latent Semantic Indexing is a novel approach to automated document indexing which is based on a statistical latent class model for factor analysis of count data. Fitted from a training corpus of text documents by a generalization of the Expectation Maximization algorithm, the utilized model is able to deal with domain-specific synonymy as well as with polysemous words. In contrast to standard Latent Semantic Indexing (LSI) by Singular Value Decomposition, the probabilistic variant has a solid statistical foundation and defines a proper generative data model. Retrieval experiments on a number of test collections indicate substantial performance gains over direct term matching methods as well as over LSI. In particular, the combination of models with different dimensionalities has proven to be advantageous.
,