2008 TopicModelsConditionedOnArbFeat
Jump to navigation
Jump to search
- (Mimno & McCallum, 2008) ⇒ David Mimno, and Andrew McCallum. (2008). “Topic Models Conditioned on Arbitrary Features with Dirichlet-multinomial Regression.” In: Proceedings of UAI.
Subject Headings: Topic Modeling Algorithm, Document Metadata.
Notes
- http://www.cs.umass.edu/~mimno/publications.html
- Text documents are usually accompanied by metadata, such as the authors, the publication venue, the date, and any references. Work in topic modeling that has taken such information into account, such as Author-Topic, Citation-Topic, and Topic-over-Time models, has generally focused on constructing specific models that are suited only for one particular type of metadata. This paper presents a simple, unified model for learning topics from documents given arbitrary non-textual features, which can be discrete, categorical, or continuous.
Cited By
Quotes
Abstract
Although fully generative models have been successfully used to model the contents of text documents, they are often awkward to apply to combinations of text data and document metadata. In this paper we propose a Dirichlet-multinomial regression (DMR) topic model that includes a log-linear prior on document-topic distributions that is a function of observed features of the document, such as author, publication venue, references, and dates. We show that by selecting appropriate features, DMR topic models can meet or exceed the performance of several previously published topic models designed for specific data.
,
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
2008 TopicModelsConditionedOnArbFeat | David Mimno Andrew McCallum | Topic Models Conditioned on Arbitrary Features with Dirichlet-multinomial Regression | Proceedings of UAI | http://www.cs.umass.edu/~mimno/papers/dmr-uai.pdf | 2008 |