Additive Smoothing: Difference between revisions
Jump to navigation
Jump to search
No edit summary |
(ContinuousReplacement) Tag: continuous replacement |
||
Line 21: | Line 21: | ||
=== 2024 === | === 2024 === | ||
* (Wikipedia, 2024) | * (Wikipedia, 2024) ⇒ https://en.wikipedia.org/wiki/Additive_smoothing Retrieved:2024-7-15. | ||
** In [[statistics]], '''additive smoothing''', also called '''[[Pierre-Simon Laplace|Laplace]] smoothing''' <ref> C. D. Manning, P. Raghavan and H. Schütze (2008). ''Introduction to Information Retrieval''. Cambridge University Press, p. 260. </ref> or '''[[George James Lidstone|Lidstone]] smoothing''', is a technique used to smooth count data, eliminating issues caused by certain values having 0 occurrences. Given a set of observation counts <math> \mathbf{x} = \langle x_1, x_2, \ldots, x_d \rangle </math> from a <math> d </math> -dimensional [[multinomial distribution]] with <math> N </math> trials, a "smoothed" version of the counts gives the [[estimator]] : <math> \hat\theta_i = \frac{x_i + \alpha}{N + \alpha d} \qquad (i = 1, \ldots, d), </math> where the smoothed count <math> \hat x_i = N \hat\theta_i </math> , and the "pseudocount" ''α'' > 0 is a smoothing [[parameter]], with ''α'' = 0 corresponding to no smoothing (this parameter is explained in below). Additive smoothing is a type of [[shrinkage estimator]], as the resulting estimate will be between the [[empirical probability]] ([[relative frequency]]) <math> x_i/N </math> and the [[Discrete uniform distribution|uniform probability]] <math> 1/d. </math> Invoking Laplace's [[rule of succession]], some authors have arguedthat ''α'' should be 1 (in which case the term '''add-one smoothing''' is also used), though in practice a smaller value is typically chosen. <P> From a [[Bayesian inference|Bayesian]] point of view, this corresponds to the [[expected value]] of the [[posterior distribution]], using a symmetric [[Dirichlet distribution]] with parameter ''α'' as a [[prior distribution]]. In the special case where the number of categories is 2, this is equivalent to using a [[beta distribution]] as the conjugate prior for the parameters of the [[binomial distribution]]. | ** In [[statistics]], '''additive smoothing''', also called '''[[Pierre-Simon Laplace|Laplace]] smoothing''' <ref> C. D. Manning, P. Raghavan and H. Schütze (2008). ''Introduction to Information Retrieval''. Cambridge University Press, p. 260. </ref> or '''[[George James Lidstone|Lidstone]] smoothing''', is a technique used to smooth count data, eliminating issues caused by certain values having 0 occurrences. Given a set of observation counts <math> \mathbf{x} = \langle x_1, x_2, \ldots, x_d \rangle </math> from a <math> d </math> -dimensional [[multinomial distribution]] with <math> N </math> trials, a "smoothed" version of the counts gives the [[estimator]] : <math> \hat\theta_i = \frac{x_i + \alpha}{N + \alpha d} \qquad (i = 1, \ldots, d), </math> where the smoothed count <math> \hat x_i = N \hat\theta_i </math> , and the "pseudocount" ''α'' > 0 is a smoothing [[parameter]], with ''α'' = 0 corresponding to no smoothing (this parameter is explained in below). Additive smoothing is a type of [[shrinkage estimator]], as the resulting estimate will be between the [[empirical probability]] ([[relative frequency]]) <math> x_i/N </math> and the [[Discrete uniform distribution|uniform probability]] <math> 1/d. </math> Invoking Laplace's [[rule of succession]], some authors have arguedthat ''α'' should be 1 (in which case the term '''add-one smoothing''' is also used), though in practice a smaller value is typically chosen. <P> From a [[Bayesian inference|Bayesian]] point of view, this corresponds to the [[expected value]] of the [[posterior distribution]], using a symmetric [[Dirichlet distribution]] with parameter ''α'' as a [[prior distribution]]. In the special case where the number of categories is 2, this is equivalent to using a [[beta distribution]] as the conjugate prior for the parameters of the [[binomial distribution]]. |
Latest revision as of 09:20, 15 July 2024
An Additive Smoothing is an image processing technique for smoothing categorical data.
- AKA: Laplace Smoothing, Lidstone Smoothing.
- See: Smoothing, Shrinkage Estimator, Posterior Distribution, Expected Value, Categorical Data.
References
2016
- (Wikipedia, 2016) ⇒ https://www.wikiwand.com/en/Additive_smoothing Retrieved 2016-07-24
- In statistics, additive smoothing, also called Laplace smoothing (not to be confused with Laplacian smoothing), or Lidstone smoothing, is a technique used to smooth categorical data. Given an observation x = (x1, …, xd) from a multinomial distribution with N trials and parameter vector θ = (θ1, …, θd), a "smoothed" version of the data gives the estimator:
- [math]\displaystyle{ \hat\theta_i= \frac{x_i + \alpha}{N + \alpha d} \qquad (i=1,\ldots,d), }[/math]
- where the pseudocount α > 0 is the smoothing parameter (α = 0 corresponds to no smoothing). Additive smoothing is a type of shrinkage estimator, as the resulting estimate will be between the empirical estimate xi / N, and the uniform probability 1/d. Using Laplace's rule of succession, some authors have argued[citation needed]that α should be 1 (in which case the term add-one smoothing is also used), though in practice a smaller value is typically chosen.
- From a Bayesian point of view, this corresponds to the expected value of the posterior distribution, using a symmetric Dirichlet distribution with parameter α as a prior. In the special case where the number of categories is 2, this is equivalent to using a Beta distribution as the conjugate prior for the parameters of Binomial distribution.
2024
- (Wikipedia, 2024) ⇒ https://en.wikipedia.org/wiki/Additive_smoothing Retrieved:2024-7-15.
- In statistics, additive smoothing, also called Laplace smoothing [1] or Lidstone smoothing, is a technique used to smooth count data, eliminating issues caused by certain values having 0 occurrences. Given a set of observation counts [math]\displaystyle{ \mathbf{x} = \langle x_1, x_2, \ldots, x_d \rangle }[/math] from a [math]\displaystyle{ d }[/math] -dimensional multinomial distribution with [math]\displaystyle{ N }[/math] trials, a "smoothed" version of the counts gives the estimator : [math]\displaystyle{ \hat\theta_i = \frac{x_i + \alpha}{N + \alpha d} \qquad (i = 1, \ldots, d), }[/math] where the smoothed count [math]\displaystyle{ \hat x_i = N \hat\theta_i }[/math] , and the "pseudocount" α > 0 is a smoothing parameter, with α = 0 corresponding to no smoothing (this parameter is explained in below). Additive smoothing is a type of shrinkage estimator, as the resulting estimate will be between the empirical probability (relative frequency) [math]\displaystyle{ x_i/N }[/math] and the uniform probability [math]\displaystyle{ 1/d. }[/math] Invoking Laplace's rule of succession, some authors have arguedthat α should be 1 (in which case the term add-one smoothing is also used), though in practice a smaller value is typically chosen.
From a Bayesian point of view, this corresponds to the expected value of the posterior distribution, using a symmetric Dirichlet distribution with parameter α as a prior distribution. In the special case where the number of categories is 2, this is equivalent to using a beta distribution as the conjugate prior for the parameters of the binomial distribution.
- In statistics, additive smoothing, also called Laplace smoothing [1] or Lidstone smoothing, is a technique used to smooth count data, eliminating issues caused by certain values having 0 occurrences. Given a set of observation counts [math]\displaystyle{ \mathbf{x} = \langle x_1, x_2, \ldots, x_d \rangle }[/math] from a [math]\displaystyle{ d }[/math] -dimensional multinomial distribution with [math]\displaystyle{ N }[/math] trials, a "smoothed" version of the counts gives the estimator : [math]\displaystyle{ \hat\theta_i = \frac{x_i + \alpha}{N + \alpha d} \qquad (i = 1, \ldots, d), }[/math] where the smoothed count [math]\displaystyle{ \hat x_i = N \hat\theta_i }[/math] , and the "pseudocount" α > 0 is a smoothing parameter, with α = 0 corresponding to no smoothing (this parameter is explained in below). Additive smoothing is a type of shrinkage estimator, as the resulting estimate will be between the empirical probability (relative frequency) [math]\displaystyle{ x_i/N }[/math] and the uniform probability [math]\displaystyle{ 1/d. }[/math] Invoking Laplace's rule of succession, some authors have arguedthat α should be 1 (in which case the term add-one smoothing is also used), though in practice a smaller value is typically chosen.
- ↑ C. D. Manning, P. Raghavan and H. Schütze (2008). Introduction to Information Retrieval. Cambridge University Press, p. 260.