Generative Statistical Model Family
A generative statistical model family is a statistical model family that can represent the joint distributions of [math]\displaystyle{ p(\mathbf{x}_{indep},\mathbf{y}_{dep}) }[/math].
- AKA: Generative Metamodel.
- Context:
- It can (typically) be in a Generative-Discriminative Relation with a Discriminative Statistical Model Family.
- It can be instantiated as a Generative Model Instance (by a generative model training system that implements a generative learning algorithm).
- …
- Example(s):
- a LDA Model Family.
- a Linear Generative Model.
- a Graphical Generative Model.
- a Naive-Bayes Model.
- a Hidden Markov Model.
- a Linear Discriminant Analysis.
- a Latent Dirichlet Allocation Model.
- a Stochastic Context-Free Grammar.
- a Averaged one-dependence Estimator.
- a Restricted Boltzmann Machine.
- a Mixture Model, such as a Gaussian mixture model.
- a Text-to-Image Model.
- …
- Counter-Example(s):
- See: Joint Probability, Inductive Logic System, Recognition Model, Parametric Model.
References
2014
- (Wikipedia, 2014) ⇒ http://en.wikipedia.org/wiki/Generative_model
- In probability and statistics, a generative model is a model for randomly generating observable data, typically given some hidden parameters. It specifies a joint probability distribution over observation and label sequences. Generative models are used in machine learning for either modeling data directly (i.e., modeling observations drawn from a probability density function), or as an intermediate step to forming a conditional probability density function. A conditional distribution can be formed from a generative model through Bayes' rule.
Shannon (1948) gives an example in which a table of frequencies of English word pairs is used to generate a sentence beginning with "representing and speedily is an good"; which is not proper English but which will increasingly approximate it as the table is moved from word pairs to word triplets etc.
Generative models contrast with discriminative models, in that a generative model is a full probabilistic model of all variables, whereas a discriminative model provides a model only for the target variable(s) conditional on the observed variables. Thus a generative model can be used, for example, to simulate (i.e. generate) values of any variable in the model, whereas a discriminative model allows only sampling of the target variables conditional on the observed quantities. Despite the fact that discriminative models do not need to model the distribution of the observed variables, they cannot generally express more complex relationships between the observed and target variables. They don't necessarily perform better than generative models at classification and regression tasks. In modern applications the two classes are seen as complementary or as different views of the same procedure.[1]
Examples of generative models include:
- If the observed data are truly sampled from the generative model, then fitting the parameters of the generative model to maximize the data likelihood is a common method. However, since most statistical models are only approximations to the true distribution, if the model's application is to infer about a subset of variables conditional on known values of others, then it can be argued that the approximation makes more assumptions than are necessary to solve the problem at hand. In such cases, it can be more accurate to model the conditional density functions directly using a discriminative model (see above), although application-specific details will ultimately dictate which approach is most suitable in any particular case.
- In probability and statistics, a generative model is a model for randomly generating observable data, typically given some hidden parameters. It specifies a joint probability distribution over observation and label sequences. Generative models are used in machine learning for either modeling data directly (i.e., modeling observations drawn from a probability density function), or as an intermediate step to forming a conditional probability density function. A conditional distribution can be formed from a generative model through Bayes' rule.
- ↑ C. M. Bishop and J. Lasserre, Generative or Discriminative? getting the best of both worlds. In Bayesian Statistics 8, Bernardo, J. M. et al. (Eds), Oxford University Press. 3–23, 2007.
2010
- (Rosen-Zvi et al., 2010) ⇒ Michal Rosen-Zvi, Chaitanya Chemudugunta, Thomas Griffiths, Padhraic Smyth, and Mark Steyvers. (2010). “Learning Author-topic Models from Text Corpora.” In: ACM Transactions on Information Systems (TOIS) Journal, 28(1). doi:10.1145/1658377.1658381
- QUOTE: Statistical approaches based upon generative models have proven effective in addressing these problems, providing efficient methods for extracting structured representations from large document collections. … In this article we describe a generative model for document collections, the author-topic (AT) model, which simultaneously models the content of documents and the interests of authors. … The generative model at the heart of our approach is based upon the idea that a document can be represented as a mixture of topics. … There are several generative models for document collections that model individual documents as mixtures of topics. … The author topic model belongs to a family of generative models for text where words are viewed as discrete random variables, a document contains a fixed number of words, and each word takes one value from a predefined vocabulary.
2007
- (Bishop & Lasserre, 2007) ⇒ Christopher M. Bishop, and Julia Lasserre. (2007). “Generative Or Discriminative? Getting the Best of Both Worlds.” In: Bayesian Statistics, 8.
- QUOTE: An alternative approach is to find the joint distribution [math]\displaystyle{ p(\bf{x};\bf{c}) }[/math], expressed for instance as a parametric model, and then subsequently uses this joint distribution to evaluate the conditional [math]\displaystyle{ p(\bf{c}|\bf{x}) }[/math] in order to make predictions of [math]\displaystyle{ c }[/math] for new values of [math]\displaystyle{ \bf{x} }[/math]. This is known as a generative approach since by sampling from the joint distribution it is possible to generate synthetic examples of the feature vector [math]\displaystyle{ \bf{x} }[/math]. In practice, the generalization performance of generative models is often found to be poorer than than of discriminative models due to differences between the model and the true distribution of the data.
2004
- (Bouchard & Triggs, 2004) ⇒ Guillaume Bouchard, and Bill Triggs. (2004). “The Trade-off Between Generative and Discriminative Classifiers.” In: Proceedings of COMPSTAT 2004.
- QUOTE: In supervised classification, inputs [math]\displaystyle{ x }[/math] and their labels [math]\displaystyle{ y }[/math] arise from an unknown joint probability p(x ; y). If we can approximate p(x,y) using a parametric family of models [math]\displaystyle{ G }[/math] = {p\theta(x,y),\theta \in T}, then a natural classifier is obtained by first estimating the class-conditional densities, then classifying each new data point to the class with highest posterior probability. This approach is called generative classification. … Conversely, the generative approach converges to the best model for the joint distribution p(x,y) but the resulting conditional density is usually a biased classifier unless its pθ(x) part is an accurate model for p(x). In real world problems the assumed generative model is rarely exact, and asymptotically, a discriminative classifier should typically be preferred [9, 5]. The key argument is that the discriminative estimator converges to the conditional density that minimizes the negative log-likelihood classification loss against the true density p(x, y) [2]. For finite sample sizes, there is a bias-variance tradeoff and it is less obvious how to choose between generative and discriminative classifiers.
2000
- (Valpola, 2000) ⇒ Harri Valpola. (2000). “Bayesian Ensemble Learning for Nonlinear Factor Analysis." PhD Dissertation, Helsinki University of Technology.
- QUOTE: generative model: A model which explicitly states how the observations are assumed to have been generated. See recognition model.