Additive Smoothing

An Additive Smoothing is an image processing technique for smoothing categorical data.

AKA: Laplace Smoothing, Lidstone Smoothing.
See: Smoothing, Shrinkage Estimator, Posterior Distribution, Expected Value, Categorical Data.

References

2016

(Wikipedia, 2016) ⇒ https://www.wikiwand.com/en/Additive_smoothing Retrieved 2016-07-24
- In statistics, additive smoothing, also called Laplace smoothing (not to be confused with Laplacian smoothing), or Lidstone smoothing, is a technique used to smooth categorical data. Given an observation x = (x₁, …, x_d) from a multinomial distribution with N trials and parameter vector θ = (θ₁, …, θ_d), a "smoothed" version of the data gives the estimator:

[math]\displaystyle{ \hat\theta_i= \frac{x_i + \alpha}{N + \alpha d} \qquad (i=1,\ldots,d), }[/math]
where the pseudocount α > 0 is the smoothing parameter (α = 0 corresponds to no smoothing). Additive smoothing is a type of shrinkage estimator, as the resulting estimate will be between the empirical estimate x_i / N, and the uniform probability 1/d. Using Laplace's rule of succession, some authors have argued^{[citation needed]}that α should be 1 (in which case the term add-one smoothing is also used), though in practice a smaller value is typically chosen.

From a Bayesian point of view, this corresponds to the expected value of the posterior distribution, using a symmetric Dirichlet distribution with parameter α as a prior. In the special case where the number of categories is 2, this is equivalent to using a Beta distribution as the conjugate prior for the parameters of Binomial distribution.

2024
(Wikipedia, 2024) ⇒ https://en.wikipedia.org/wiki/Additive_smoothing Retrieved:2024-7-15.
In statistics, additive smoothing, also called Laplace smoothing ^[1] or Lidstone smoothing, is a technique used to smooth count data, eliminating issues caused by certain values having 0 occurrences. Given a set of observation counts [math]\displaystyle{ \mathbf{x} = \langle x_1, x_2, \ldots, x_d \rangle }[/math] from a [math]\displaystyle{ d }[/math] -dimensional multinomial distribution with [math]\displaystyle{ N }[/math] trials, a "smoothed" version of the counts gives the estimator : [math]\displaystyle{ \hat\theta_i = \frac{x_i + \alpha}{N + \alpha d} \qquad (i = 1, \ldots, d), }[/math] where the smoothed count [math]\displaystyle{ \hat x_i = N \hat\theta_i }[/math] , and the "pseudocount" α > 0 is a smoothing parameter, with α = 0 corresponding to no smoothing (this parameter is explained in below). Additive smoothing is a type of shrinkage estimator, as the resulting estimate will be between the empirical probability (relative frequency) [math]\displaystyle{ x_i/N }[/math] and the uniform probability [math]\displaystyle{ 1/d. }[/math] Invoking Laplace's rule of succession, some authors have arguedthat α should be 1 (in which case the term add-one smoothing is also used), though in practice a smaller value is typically chosen.
From a Bayesian point of view, this corresponds to the expected value of the posterior distribution, using a symmetric Dirichlet distribution with parameter α as a prior distribution. In the special case where the number of categories is 2, this is equivalent to using a beta distribution as the conjugate prior for the parameters of the binomial distribution.

↑ C. D. Manning, P. Raghavan and H. Schütze (2008). Introduction to Information Retrieval. Cambridge University Press, p. 260.

[1] C. D. Manning, P. Raghavan and H. Schütze (2008). Introduction to Information Retrieval. Cambridge University Press, p. 260.

[1]

Additive Smoothing

References

2016

2024

Navigation menu

Search