Additive Smoothing

From GM-RKB
Jump to navigation Jump to search

An Additive Smoothing is an image processing technique for smoothing categorical data.



References

2016

[math]\displaystyle{ \hat\theta_i= \frac{x_i + \alpha}{N + \alpha d} \qquad (i=1,\ldots,d), }[/math]
where the pseudocount α > 0 is the smoothing parameter (α = 0 corresponds to no smoothing). Additive smoothing is a type of shrinkage estimator, as the resulting estimate will be between the empirical estimate xi / N, and the uniform probability 1/d. Using Laplace's rule of succession, some authors have argued[citation needed]that α should be 1 (in which case the term add-one smoothing is also used), though in practice a smaller value is typically chosen.
From a Bayesian point of view, this corresponds to the expected value of the posterior distribution, using a symmetric Dirichlet distribution with parameter α as a prior. In the special case where the number of categories is 2, this is equivalent to using a Beta distribution as the conjugate prior for the parameters of Binomial distribution.

2024

  • (Wikipedia, 2024) ⇒ https://en.wikipedia.org/wiki/Additive_smoothing Retrieved:2024-7-15.
    • In statistics, additive smoothing, also called Laplace smoothing [1] or Lidstone smoothing, is a technique used to smooth count data, eliminating issues caused by certain values having 0 occurrences. Given a set of observation counts [math]\displaystyle{ \mathbf{x} = \langle x_1, x_2, \ldots, x_d \rangle }[/math] from a [math]\displaystyle{ d }[/math] -dimensional multinomial distribution with [math]\displaystyle{ N }[/math] trials, a "smoothed" version of the counts gives the estimator : [math]\displaystyle{ \hat\theta_i = \frac{x_i + \alpha}{N + \alpha d} \qquad (i = 1, \ldots, d), }[/math] where the smoothed count [math]\displaystyle{ \hat x_i = N \hat\theta_i }[/math] , and the "pseudocount" α > 0 is a smoothing parameter, with α = 0 corresponding to no smoothing (this parameter is explained in below). Additive smoothing is a type of shrinkage estimator, as the resulting estimate will be between the empirical probability (relative frequency) [math]\displaystyle{ x_i/N }[/math] and the uniform probability [math]\displaystyle{ 1/d. }[/math] Invoking Laplace's rule of succession, some authors have arguedthat α should be 1 (in which case the term add-one smoothing is also used), though in practice a smaller value is typically chosen.

      From a Bayesian point of view, this corresponds to the expected value of the posterior distribution, using a symmetric Dirichlet distribution with parameter α as a prior distribution. In the special case where the number of categories is 2, this is equivalent to using a beta distribution as the conjugate prior for the parameters of the binomial distribution.

  1. C. D. Manning, P. Raghavan and H. Schütze (2008). Introduction to Information Retrieval. Cambridge University Press, p. 260.