Additive Smoothing: Difference between revisions

Latest revision as of 09:20, 15 July 2024

An Additive Smoothing is an image processing technique for smoothing categorical data.

AKA: Laplace Smoothing, Lidstone Smoothing.
See: Smoothing, Shrinkage Estimator, Posterior Distribution, Expected Value, Categorical Data.

References

2016

(Wikipedia, 2016) ⇒ https://www.wikiwand.com/en/Additive_smoothing Retrieved 2016-07-24
- In statistics, additive smoothing, also called Laplace smoothing (not to be confused with Laplacian smoothing), or Lidstone smoothing, is a technique used to smooth categorical data. Given an observation x = (x₁, …, x_d) from a multinomial distribution with N trials and parameter vector θ = (θ₁, …, θ_d), a "smoothed" version of the data gives the estimator:

[math]\displaystyle{ \hat\theta_i= \frac{x_i + \alpha}{N + \alpha d} \qquad (i=1,\ldots,d), }[/math]
where the pseudocount α > 0 is the smoothing parameter (α = 0 corresponds to no smoothing). Additive smoothing is a type of shrinkage estimator, as the resulting estimate will be between the empirical estimate x_i / N, and the uniform probability 1/d. Using Laplace's rule of succession, some authors have argued^{[citation needed]}that α should be 1 (in which case the term add-one smoothing is also used), though in practice a smaller value is typically chosen.

From a Bayesian point of view, this corresponds to the expected value of the posterior distribution, using a symmetric Dirichlet distribution with parameter α as a prior. In the special case where the number of categories is 2, this is equivalent to using a Beta distribution as the conjugate prior for the parameters of Binomial distribution.

2024
(Wikipedia, 2024) ⇒ https://en.wikipedia.org/wiki/Additive_smoothing Retrieved:2024-7-15.
In statistics, additive smoothing, also called Laplace smoothing ^[1] or Lidstone smoothing, is a technique used to smooth count data, eliminating issues caused by certain values having 0 occurrences. Given a set of observation counts [math]\displaystyle{ \mathbf{x} = \langle x_1, x_2, \ldots, x_d \rangle }[/math] from a [math]\displaystyle{ d }[/math] -dimensional multinomial distribution with [math]\displaystyle{ N }[/math] trials, a "smoothed" version of the counts gives the estimator : [math]\displaystyle{ \hat\theta_i = \frac{x_i + \alpha}{N + \alpha d} \qquad (i = 1, \ldots, d), }[/math] where the smoothed count [math]\displaystyle{ \hat x_i = N \hat\theta_i }[/math] , and the "pseudocount" α > 0 is a smoothing parameter, with α = 0 corresponding to no smoothing (this parameter is explained in below). Additive smoothing is a type of shrinkage estimator, as the resulting estimate will be between the empirical probability (relative frequency) [math]\displaystyle{ x_i/N }[/math] and the uniform probability [math]\displaystyle{ 1/d. }[/math] Invoking Laplace's rule of succession, some authors have arguedthat α should be 1 (in which case the term add-one smoothing is also used), though in practice a smaller value is typically chosen.
From a Bayesian point of view, this corresponds to the expected value of the posterior distribution, using a symmetric Dirichlet distribution with parameter α as a prior distribution. In the special case where the number of categories is 2, this is equivalent to using a beta distribution as the conjugate prior for the parameters of the binomial distribution.

↑ C. D. Manning, P. Raghavan and H. Schütze (2008). Introduction to Information Retrieval. Cambridge University Press, p. 260.

[1] C. D. Manning, P. Raghavan and H. Schütze (2008). Introduction to Information Retrieval. Cambridge University Press, p. 260.

[1]

@@ Line 21: / Line 21: @@
 === 2024 ===
-* (Wikipedia, 2024) &rArr; https://en.wikipedia.org/wiki/Additive_smoothing Retrieved:2024-7-15.
+* (Wikipedia, 2024) ⇒ https://en.wikipedia.org/wiki/Additive_smoothing Retrieved:2024-7-15.
 ** In [[statistics]], '''additive smoothing''', also called '''[[Pierre-Simon Laplace|Laplace]] smoothing''' <ref> C. D. Manning, P. Raghavan and H. Schütze (2008). ''Introduction to Information Retrieval''. Cambridge University Press, p.&nbsp;260. </ref> or '''[[George James Lidstone|Lidstone]] smoothing''', is a technique used to smooth count data, eliminating issues caused by certain values having 0 occurrences. Given a set of observation counts <math> \mathbf{x} = \langle x_1, x_2, \ldots, x_d \rangle </math> from a <math> d </math> -dimensional [[multinomial distribution]] with <math> N </math> trials, a "smoothed" version of the counts gives the [[estimator]] : <math> \hat\theta_i = \frac{x_i + \alpha}{N + \alpha d} \qquad (i = 1, \ldots, d), </math> where the smoothed count <math> \hat x_i = N \hat\theta_i </math> , and the "pseudocount" ''α''&nbsp;>&nbsp;0 is a smoothing [[parameter]], with ''α''&nbsp;=&nbsp;0 corresponding to no smoothing (this parameter is explained in below). Additive smoothing is a type of [[shrinkage estimator]], as the resulting estimate will be between the [[empirical probability]] ([[relative frequency]]) <math> x_i/N </math> and the [[Discrete uniform distribution|uniform probability]] <math> 1/d. </math> Invoking Laplace's [[rule of succession]], some authors have arguedthat ''α'' should be 1 (in which case the term '''add-one smoothing'''   is also used), though in practice a smaller value is typically chosen. 	<P>	 From a [[Bayesian inference|Bayesian]] point of view, this corresponds to the [[expected value]] of the [[posterior distribution]], using a symmetric [[Dirichlet distribution]] with parameter ''α'' as a [[prior distribution]]. In the special case where the number of categories is 2, this is equivalent to using a [[beta distribution]] as the conjugate prior for the parameters of the [[binomial distribution]].

Additive Smoothing: Difference between revisions

Latest revision as of 09:20, 15 July 2024

References

2016

2024

Navigation menu

Search