Marginal Likelihood Function
A Marginal Likelihood Function is a likelihood function in which some parameter variables have been marginalized.
- AKA: Normalized Likelihood, Marginalized Likelihood.
- Context:
- It can be a normalizing constant in Bayes' Rule.
- It can be computed as [math]\displaystyle{ P(e) = P(R=0, e) + P(R=1, e) + … = \sum_r P(e | R=r) P(R=r) }[/math] (since we marginalize out over R), and gives the prior probabilityof the evidence.
- …
- Counter-Example(s):
- See: Marginal Probability, Bayes' Rule, Standardized Likelihood, Multiplicative Inverse, Mixed Model, Residual Maximum Likelihood.
References
2014a
- (Wikipedia, 2014) ⇒ http://en.wikipedia.org/wiki/Likelihood_function#Marginal_likelihood Retrieved:2014-12-10.
- Sometimes we can remove the nuisance parameters by considering a likelihood based on only part of the information in the data, for example by using the set of ranks rather than the numerical values. Another example occurs in linear mixed models, where considering a likelihood for the residuals only after fitting the fixed effects leads to residual maximum likelihood estimation of the variance components.
2014b
- (Wikipedia, 2014) ⇒ http://en.wikipedia.org/wiki/Normalizing_constant#Bayes Retrieved:2014-1-15.
- Bayes' theorem says that the posterior probability measure is proportional to the product of the prior probability measure and the likelihood function. Proportional to implies that one must multiply or divide by a normalizing constant to assign measure 1 to the whole space, i.e., to get a probability measure. In a simple discrete case we have :[math]\displaystyle{ P(H_0|D) = \frac{P(D|H_0)P(H_0)}{P(D)} }[/math]
where P(H0) is the prior probability that the hypothesis is true; P(D|H0) is the conditional probability of the data given that the hypothesis is true, but given that the data are known it is the likelihood of the hypothesis (or its parameters) given the data; P(H0|D) is the posterior probability that the hypothesis is true given the data. P(D) should be the probability of producing the data, but on its own is difficult to calculate, so an alternative way to describe this relationship is as one of proportionality: :[math]\displaystyle{ P(H_0|D) \propto P(D|H_0)P(H_0). }[/math]
Since P(H|D) is a probability, the sum over all possible (mutually exclusive) hypotheses should be 1, leading to the conclusion that :[math]\displaystyle{ P(H_0|D) = \frac{P(D|H_0)P(H_0)}{\displaystyle\sum_i P(D|H_i)P(H_i)} . }[/math]
In this case, the reciprocal of the value :[math]\displaystyle{ P(D)=\sum_i P(D|H_i)P(H_i) \; }[/math]
is the normalizing constant. [1] It can be extended from countably many hypotheses to uncountably many by replacing the sum by an integral.
- Bayes' theorem says that the posterior probability measure is proportional to the product of the prior probability measure and the likelihood function. Proportional to implies that one must multiply or divide by a normalizing constant to assign measure 1 to the whole space, i.e., to get a probability measure. In a simple discrete case we have :[math]\displaystyle{ P(H_0|D) = \frac{P(D|H_0)P(H_0)}{P(D)} }[/math]
- ↑ Feller, 1968, p. 124.
2013
- http://en.wikipedia.org/wiki/Marginal_likelihood
- In statistics, a marginal likelihood function, or integrated likelihood, is a likelihood function in which some parameter variables have been marginalized. In the context of Bayesian statistics, it may also be referred to as the evidence or model evidence.
Given a set of independent identically distributed data points [math]\displaystyle{ \mathbb{X}=(x_1,\ldots,x_n), }[/math] where [math]\displaystyle{ x_i \sim p(x_i|\theta) }[/math] according to some probability distribution parameterized by θ, where θ itself is a random variable described by a distribution, i.e. [math]\displaystyle{ \theta \sim p(\theta|\alpha), }[/math] the marginal likelihood in general asks what the probability [math]\displaystyle{ p(\mathbb{X}|\alpha) }[/math] is, where θ has been marginalized out (integrated out): :[math]\displaystyle{ p(\mathbb{X}|\alpha) = \int_\theta p(\mathbb{X}|\theta) \, p(\theta|\alpha)\ \operatorname{d}\!\theta }[/math]
The above definition is phrased in the context of Bayesian statistics. In classical (frequentist) statistics, the concept of marginal likelihood occurs instead in the context of a joint parameter θ=(ψ,λ), where ψ is the actual parameter of interest, and λ is a non-interesting nuisance parameter. If there exists a probability distribution for λ, it is often desirable to consider the likelihood function only in terms of ψ, by marginalizing out λ: :[math]\displaystyle{ \mathcal{L}(\psi;\mathbb{X}) = p(\mathbb{X}|\psi) = \int_\Lambda p(\mathbb{X}|\psi,\lambda) \, p(\lambda|\psi) \ \operatorname{d}\!\lambda }[/math]
Unfortunately, marginal likelihoods are generally difficult to compute. Exact solutions are known for a small class of distributions, particularly when the marginalized-out parameter is the conjugate prior of the distribution of the data. In other cases, some kind of numerical integration method is needed, either a general method such as Gaussian integration or a Monte Carlo method, or a method specialized to statistical problems such as the Laplace approximation, Gibbs sampling or the EM algorithm.
It is also possible to apply the above considerations to a single random variable (data point) x, rather than a set of observations. In a Bayesian context, this is equivalent to the prior predictive distribution of a data point.
- In statistics, a marginal likelihood function, or integrated likelihood, is a likelihood function in which some parameter variables have been marginalized. In the context of Bayesian statistics, it may also be referred to as the evidence or model evidence.
2010
- http://www.cs.ubc.ca/~murphyk/Bayes/bayesrule.html
- … Mathematically, Bayes' rule states
posterior = likelihood * prior / marginal likelihood
or, in symbols, [math]\displaystyle{ P(R=r | e) = \frac{P(e | R=r) P(R=r)}{P(e)} }[/math] where [math]\displaystyle{ P(R=r|e) }[/math] denotes the probability that random variable [math]\displaystyle{ R }[/math] has value [math]\displaystyle{ r }[/math] given evidence [math]\displaystyle{ e }[/math]. The denominator is just a normalizing constant that ensures the posterior adds up to 1; it can be computed by summing up the numerator over all possible values of R, i.e., [math]\displaystyle{ P(e) = P(R=0, e) + P(R=1, e) + … = \sum_r P(e | R=r) P(R=r) }[/math] This is called the marginal likelihood (since we marginalize out over R), and gives the prior probabilityof the evidence.
- … Mathematically, Bayes' rule states
2003
- (Davison, 2003) ⇒ Anthony C. Davison. (2003). “Statistical Models." Cambridge University Press. ISBN:0521773393