Generative and Discriminative Learning
See: Generative Learning, Discriminative Learning, Evolutionary Feature Selection and Construction.
References
2011
- (Liu & Webb, 2011) ⇒ Bin Liu, and Geoffrey I. Webb. (2011). “Generative and Discriminative Learning.” In: (Sammut & Webb, 2011) p.454
- Generative learning refers alternatively to any classification learning process that classifies by using an [[estimate of the joint probability [math]\displaystyle{ P(y,\bf{x}) }[/math] or to any classification learning process that classifies by using estimates of the prior probability [math]\displaystyle{ P(y) }[/math] and the conditional probability [math]\displaystyle{ P(\bf{x}, y) }[/math] (Bishop, 2007; Jaakkola & Haussler, 1999; Jaakkola, Meila & Jebara, 1999; Lasserre, Bishop & Minka, 2006; Ng & Jordan, 2002), where [math]\displaystyle{ y }[/math] is a class and [math]\displaystyle{ \bf{x} }[/math] is a description of an object to be classified. Generative learning contrasts with [math]\displaystyle{ discriminative learning }[/math] in which a model or estimate of [math]\displaystyle{ P(y|\bf{x}) }[/math] informed without reference to an explicit estimate of any of [math]\displaystyle{ P(y,\bf{x}) }[/math], [math]\displaystyle{ P(\bf{x}) }[/math] or [math]\displaystyle{ P(\bf{x},y) }[/math].
It is also common to categorize as discriminative approaches based on a decision function that directly maps from input [math]\displaystyle{ \bf{x} }[/math] onto the output [math]\displaystyle{ y }[/math] (such as support vector machines, neural networks, and decision trees), where the decision risk is minimized without estimation of [math]\displaystyle{ P(y,\bf{x}) }[/math], [math]\displaystyle{ P(\bf{x} | y) }[/math] or [math]\displaystyle{ P(y|\bf{x}) }[/math] (Jaakkola & Haussler, 1999).
The standard exemplar of generative learning is naïve Bayes and of discriminative learning, logistic regression. Another important contrasting pair is the generative hidden Markov model and discriminative conditional random field. It is widely accepted that generative learning workds well when samples are rare while discriminative learning has better asymptotic error performance (Ng & Jordan, 2002).
Efron (1975) provides an early examination of generative/discriminative distinction. Efron performs an empirical comparison]] of the efficiency of the generative linear discriminant analysis (LDA) and discriminative logistic regression. His results show that logistic regression has 30% less efficiency than LDA, which means that discriminative approach is 30% slower to reach the asymptotic error than the generative approach.
Ng et al. (2002) give a theoretical discussion of the efficiency of generative naïve Bayes and discriminative logistic regression. Their result shows that logistic regression converges towards its asymptotic error in order [math]\displaystyle{ n }[/math] samples while naitve Bayes converges in order log [math]\displaystyle{ n }[/math] samples. While logistic regression converges much slower than naïve Bayes, it has lower asymptotic error than naïve Bayes. These results suggest that it is desirable to use a generative approach when training data is scares and to use a discriminative approach when there is [[large training dataset|enough training data]].
Recent research into the generative/discriminative learning distinction has concentrated on the area of hybrids of generative and discriminative learning, as well as generative learning and discriminative learning in structured data learning or semi-supervised learning context.
In hybrid approaches, researchers see to obtain the merits of both generative learning and discriminative learning. Some examples include the Fisher kernel for discriminative learning (Jaakkola & Haussler, 1999), max-ent discriminative learning (Jaakkola, Meila & Jebara, 1999), and principled hybrids of generative and discriminative models (Lasserre, Bishop & Minka, 2006). In structured data learning, the output data have dependent relationships. As an example of generative learning, the hidden Markov models are used in structured data problems which need sequential decisions. The discriminative analog is the conditional random field models. Another example of discriminatively structured learning is Max-margin Markov networks (Taskar, Guestrin & Koller, 2004).
In semi-supervised learning, co-training and multiview learning are usually applied to generative learning (Blum & Mitchell, 1998). It is less straightforward to apply semi-supervised learning in traditional discriminative learning, since [math]\displaystyle{ P(y|\bf{x}) }[/math] is estimated by ignoring [math]\displaystyle{ P(\bf{x}) }[/math]. Examples of semi-supervised learning methods in [[discriminative learning include Transductive SVM, Gaussian processes, information regularization, and graph-based methods (Chapelle, Scholkopf & Zien, 2006).
- Generative learning refers alternatively to any classification learning process that classifies by using an [[estimate of the joint probability [math]\displaystyle{ P(y,\bf{x}) }[/math] or to any classification learning process that classifies by using estimates of the prior probability [math]\displaystyle{ P(y) }[/math] and the conditional probability [math]\displaystyle{ P(\bf{x}, y) }[/math] (Bishop, 2007; Jaakkola & Haussler, 1999; Jaakkola, Meila & Jebara, 1999; Lasserre, Bishop & Minka, 2006; Ng & Jordan, 2002), where [math]\displaystyle{ y }[/math] is a class and [math]\displaystyle{ \bf{x} }[/math] is a description of an object to be classified. Generative learning contrasts with [math]\displaystyle{ discriminative learning }[/math] in which a model or estimate of [math]\displaystyle{ P(y|\bf{x}) }[/math] informed without reference to an explicit estimate of any of [math]\displaystyle{ P(y,\bf{x}) }[/math], [math]\displaystyle{ P(\bf{x}) }[/math] or [math]\displaystyle{ P(\bf{x},y) }[/math].
1999
- (Jaakkola & Haussler, 1999) ⇒ Tommi Jaakkola, and David Haussler. (1999) "Exploiting Generative Models in Discriminative Classifiers.” In: Advances in Neural Information Processing Systems (NIPS 1999).