Probabilistic Topic Model
Jump to navigation
Jump to search
A Probabilistic Topic Model is a document topic model that is a statistical model.
- AKA: Statistical Topic Model.
- Context:
- It can be produced by a Topic Model Learning Task.
- Example(s):
- Latent Dirichlet Allocation Model (allows for document-level analysis).
- Probabilistic Latent Semantic Indexing Model (does not allow for document-level analysis).
- …
- Counter-Example(s):
- See: Dimensionality Reduction, Semantic Latent Space, Singular Value Decomposition.
References
2010
- (Wei, Barnaghi & Bargiela, 2010) ⇒ Wang Wei and Payam Barnaghi and Andrzej Bargiela. (2010). “Probabilistic Topic Models for Learning Terminological Ontologies.” In: IEEE Transactions on Knowledge and Data Engineering (TKDE), 22(7). doi:10.1109/TKDE.2009.122
- QUOTE: Probabilistic topic models were originally developed and utilized for document modeling and topic extraction in Information Retrieval.
- …
- The probabilistic topic models have been primarily used in document modeling, topic extraction, and classification purposes in Information Retrieval [11], [12], [13], [14].
2009
- (Blei & Lafferty, 2009) ⇒ David M. Blei, and John D. Lafferty. (2009). “Topic Models.” In: A. Srivastava and M. Sahami, editors, Text Mining: Classification, Clustering, and Applications . Chapman & Hall/CRC Data Mining and Knowledge Discovery Series.
- QUOTE: In this chapter, we describe topic models, probabilistic models for uncovering the underlying semantic structure of a document collection based on a hierarchical Bayesian analysis of the original texts Blei et al. (2003); Griffiths and Steyvers (2004); Buntine and Jakulin (2004); Hofmann (1999); Deerwester et al. (1990).
2007
- (Steyvers & Griffiths, 2007) ⇒ Mark Steyvers, and Thomas L. Griffiths. (2007). “Probabilistic Topic Models.” In: (Landauer et al., 2007).
- QUOTE: Topic models (e.g., Blei, Ng, & Jordan, 2003; Griffiths & Steyvers, 2002; 2003; 2004; Hofmann, 1999; 2001) are based upon the idea that documents are mixtures of topics, where a topic is a probability distribution over words. A topic model is a generative model for documents: it specifies a simple probabilistic procedure by which documents can be generated. To make a new document, one chooses a distribution over topics. Then, for each word in that document, one chooses a topic at random according to this distribution, and draws a word from that topic. Standard statistical techniques can be used to invert this process, inferring the set of topics that were responsible for generating a collection of documents.
2005
- (Steyvers & Griffiths, 2005) ⇒ Mark Steyvers, and Thomas L. Griffiths. (2005). “Probabilistic Topic Models.” In: "Latent Semantic Analysis: A Road to Meaning.” Thomas K. Landauer (editor), D. Mcnamara (editor), S. Dennis (editor), and W. Kintsch (editor). Laurence Erlbaum,.
2004
- (Griffiths & Steyvers, 2004) ⇒ Thomas L. Griffiths, and Mark Steyvers. (2004). “Finding Scientific Topics.” In: Proceedings of the National Academy of Sciences (PNAS), 101(Suppl. 1). doi:10.1073/pnas.0307752101
- QUOTE: A first step in identifying the content of a document is determining which topics that document addresses.
2003
- (Blei, Ng & Jordan, 2003) ⇒ David M. Blei, Andrew Y. Ng , and Michael I. Jordan. (2003). “Latent Dirichlet Allocation.” In: The Journal of Machine Learning Research, 3. doi:10.1162/jmlr.2003.3.4-5.993
1999
- (Hofmann, 1999b) ⇒ Thomas Hofmann. (1999). “Probabilistic Latent Semantic Analysis.” In: Proceedings of UAI Conference (UAI 1999).