Anderson-Darling Test

From GM-RKB
Jump to navigation Jump to search

An Anderson-Darling Test is a hypothesis test/goodness-of-fit test based on the distance of the theoretical distribution function and the empirical cumulative distribution function.

[math]\displaystyle{ \omega_n^2 = n \int_{-\infty}^\infty (F_n(x) - F(x))^2 \,w(x) \, dF(x) \quad \text{with} \quad w(x)=\frac{1}{F(x)\; (1-F(x))} }[/math]
where [math]\displaystyle{ F(x) }[/math] is the theoretical distribution function, [math]\displaystyle{ F_n(x) }[/math] is the empirical cumulative distribution function and [math]\displaystyle{ w(x) }[/math] is the weight function. When n is large enough this represents the average of the squared errors between the two distributions weighted by the implicit uncertainty, [math]\displaystyle{ w(x) }[/math], due to the estimation method. When [math]\displaystyle{ w(x)=1 }[/math] this is analogous to the Cramer-Von Mises Test.


References

2016

  • (Wikipedia, 2016) ⇒ https://www.wikiwand.com/en/Anderson%E2%80%93Darling_test Retrieved 2016-07-31
    • The Anderson–Darling test is a statistical test of whether a given sample of data is drawn from a given probability distribution. In its basic form, the test assumes that there are no parameters to be estimated in the distribution being tested, in which case the test and its set of critical values is distribution-free. However, the test is most often used in contexts where a family of distributions is being tested, in which case the parameters of that family need to be estimated and account must be taken of this in adjusting either the test-statistic or its critical values. When applied to testing whether a normal distribution adequately describes a set of data, it is one of the most powerful statistical tools for detecting most departures from normality. K-sample Anderson–Darling tests are available for testing whether several collections of observations can be modelled as coming from a single population, where the distribution function does not have to be specified. In addition to its use as a test of fit for distributions, it can be used in parameter estimation as the basis for a form of minimum distance estimation procedure. The test is named after Theodore Wilbur Anderson (born 1918) and Donald A. Darling, who invented it in 1952.
(...)The Anderson–Darling and Cramér–von Mises statistics belong to the class of quadratic EDF statistics (tests based on the empirical distribution function) If the hypothesized distribution is [math]\displaystyle{ F }[/math], and empirical (sample) cumulative distribution function is [math]\displaystyle{ F_n }[/math], then the quadratic EDF statistics measure the distance between [math]\displaystyle{ F }[/math] and [math]\displaystyle{ F_n }[/math] by
[math]\displaystyle{ n \int_{-\infty}^\infty (F_n(x) - F(x))^2\,w(x)\,dF(x), }[/math]
where [math]\displaystyle{ w(x) }[/math] is a weighting function. When the weighting function is [math]\displaystyle{ w(x)=1 }[/math], the statistic is the Cramér–von Mises statistic. The Anderson–Darling (1954) test is based on the distance
[math]\displaystyle{ A = n \int_{-\infty}^\infty \frac{(F_n(x) - F(x))^2}{F(x)\; (1-F(x))} \, dF(x), }[/math]
which is obtained when the weight function is [math]\displaystyle{ w(x)=[F(x)\; (1-F(x))]^{-1} }[/math]. Thus, compared with the Cramér–von Mises distance, the Anderson–Darling distance places more weight on observations in the tails of the distribution.

2015

[math]\displaystyle{ W^2 = n \int_{-\infty}^\infty \frac{(F_n(x) - F(x))^2}{F(x)\; (1-F(x))} \, dF(x), \quad (1) }[/math]
where [math]\displaystyle{ F(x) }[/math] is the target CDF and [math]\displaystyle{ F_n(x) }[/math] is the empirical distribution derived from the data. The numerator of Eq. (1) represents the distance of the theoretical distribution from the empirical one, while the denominator represents the variance of the empirical estimation of [math]\displaystyle{ F(x) }[/math] when the central limit theorem holds, i.e. when n is large enough. In other words, Eq. (1) represents the average of the squared errors between the two distributions (theoretical and empirical) weighted by the implicit uncertainty due to the estimation method of the empirical CDF (order statistics). As the CDF of a random variable is always distributed uniformly between zero and one (i.e. [math]\displaystyle{ F(x) \in U(0, 1) }[/math]), [math]\displaystyle{ W^2 }[/math] is a function of uniformly distributed random variables when H0 and the central limit theorem hold. In particular, it does not depend on the distribution [math]\displaystyle{ F(x) }[/math].

1952