Maximum a Posteriori Estimation Algorithm

A Maximum a Posteriori Estimation Algorithm is a point estimation algorithm that can solve a maximum a posteriori estimation task (by producing a maximum a posteriori estimate).

AKA: MAP-based Estimator.
Example(s):
- a MAP-based Parameter Estimation Algorithm.
- …
Counter-Example(s):
- a Maximum Likelihood Estimation Algorithm.
- a Expectation-Maximization (EM) Algorithm.
See: A Posteriori Estimate Algorithm, (Dempster et al., 1977), argmax, Bayesian Inference.

References

2015

(Wikipedia, 2015) ⇒ http://en.wikipedia.org/wiki/Maximum_a_posteriori_estimation#Description Retrieved:2015-6-15.
- Assume that we want to estimate an unobserved population parameter [math]\displaystyle{ \theta }[/math] on the basis of observations [math]\displaystyle{ x }[/math] . Let [math]\displaystyle{ f }[/math] be the sampling distribution of [math]\displaystyle{ x }[/math] , so that [math]\displaystyle{ f(x|\theta) }[/math] is the probability of [math]\displaystyle{ x }[/math] when the underlying population parameter is [math]\displaystyle{ \theta }[/math] . Then the function: : [math]\displaystyle{ \theta \mapsto f(x | \theta) \! }[/math] is known as the likelihood function and the estimate: : [math]\displaystyle{ \hat{\theta}_{\mathrm{ML}}(x) = \underset{\theta}{\operatorname{arg\,max}} \ f(x | \theta) \! }[/math] is the maximum likelihood estimate of [math]\displaystyle{ \theta }[/math] .
  Now assume that a prior distribution [math]\displaystyle{ g }[/math] over [math]\displaystyle{ \theta }[/math] exists. This allows us to treat [math]\displaystyle{ \theta }[/math] as a random variable as in Bayesian statistics. Then the posterior distribution of [math]\displaystyle{ \theta }[/math] is as follows: : [math]\displaystyle{ \theta \mapsto f(\theta | x) = \frac{f(x | \theta) \, g(\theta)}{\displaystyle\int_{\vartheta \in \Theta} f(x | \vartheta) \, g(\vartheta) \, d\vartheta} \! }[/math] where [math]\displaystyle{ g }[/math] is density function of [math]\displaystyle{ \theta }[/math] , [math]\displaystyle{ \Theta }[/math] is the domain of [math]\displaystyle{ g }[/math] . This is a straightforward application of Bayes' theorem.
  The method of maximum a posterior estimation then estimates [math]\displaystyle{ \theta }[/math] as the mode of the posterior distribution of this random variable: : [math]\displaystyle{ \hat{\theta}_{\mathrm{MAP}}(x) = \underset{\theta}{\operatorname{arg\,max}} \ \frac{f(x | \theta) \, g(\theta)} {\displaystyle\int_{\vartheta} f(x | \vartheta) \, g(\vartheta) \, d\vartheta} = \underset{\theta}{\operatorname{arg\,max}} \ f(x | \theta) \, g(\theta). \! }[/math] The denominator of the posterior distribution (so-called partition function) does not depend on [math]\displaystyle{ \theta }[/math] and therefore plays no role in the optimization. Observe that the MAP estimate of [math]\displaystyle{ \theta }[/math] coincides with the ML estimate when the prior [math]\displaystyle{ g }[/math] is uniform (that is, a constant function). And when the loss function is of the form: :[math]\displaystyle{ L(\theta, a) = \begin{cases} 0 & \mbox{, if } |a-\theta|\lt c \\ 1 & \mbox{, otherwise} \\ \end{cases} \! }[/math] as [math]\displaystyle{ c }[/math] goes to 0, the sequence of Bayes estimators approaches the MAP estimator, provided that the distribution of [math]\displaystyle{ \theta }[/math] is unimodal. But generally a MAP estimator is not a Bayes estimator unless [math]\displaystyle{ \theta }[/math] is discrete.

2011

http://quantivity.wordpress.com/2011/05/23/why-minimize-negative-log-likelihood/
- QUOTE:Why is minimizing the negative log likelihood equivalent to maximum a posteriori probability (MAP), given a uniform prior?

2000

(Valpola, 2000) ⇒ Harri Valpola. (2000). “Bayesian Ensemble Learning for Nonlinear Factor Analysis." PhD Dissertation, Helsinki University of Technology.
- QUOTE: … The two point estimates in wide use are the maximum likelihood (ML) and the maximum a posteriori (MAP) estimator. The ML estimator neglects the prior probability of the models and maximises only the probability which the model gives for the observation. The MAP estimator chooses the model which has the highest posterior probability mass or density.

1994

(Gauvain & Lee, 1994) ⇒ J.-L. Gauvain, and Chin-Hui Lee. (1994). “Maximum a Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains.” In: IEEE Transactions on Speech and Audio Processing, 2(2). doi:10.1109/89.279278

1989

(Greig et al., 1989) ⇒ D. M. Greig, B. T. Porteous and A. H. Seheult. (1989). “Exact Maximum A Posteriori Estimation for Binary Images.” In: Journal of the Royal Statistical Society. Series B (Methodological), 51(2). http://www.jstor.org/stable/2345609