Anderson-Darling test
Editor-In-Chief: C. Michael Gibson, M.S., M.D. [1]
Overview
The Anderson-Darling test, named after Theodore Wilbur Anderson, Jr. (1918–?) and Donald A. Darling (1915–?), who invented it in 1952[1], is one of the most powerful statistics for detecting most departures from normality. It may be used with small sample sizes n ≤ 25. Very large sample sizes may reject the assumption of normality with only slight imperfections, but industrial data with sample sizes of 200 and more have passed the Anderson-Darling test. [citation needed]
The Anderson-Darling test assesses whether a sample comes from a specified distribution. The formula for the test statistic <math>A</math> to assess if data <math>\{Y_1<\cdots <Y_N\}</math> (note that the data must be put in order) comes from a distribution with cumulative distribution function (CDF) <math>F</math> is
- <math>A^2 = -N-S</math>
where
- <math>S=\sum_{k=1}^N \frac{2k-1}{N}\left[\ln F(Y_k) + \ln\left(1-F(Y_{N+1-k})\right)\right].</math>
The test statistic can then be compared against the critical values of the theoretical distribution (dependent on which <math>F</math> is used) to determine the P-value.
The Anderson-Darling test for normality is a distance or empirical distribution function (EDF) test. It is based upon the concept that when given a hypothesized underlying distribution, the data can be transformed to a uniform distribution. The transformed sample data can be then tested for uniformity with a distance test (Shapiro 1980).
In comparisons of power, Stephens (1974) found <math>A^2</math> to be one of the best EDF statistics for detecting most departures from normality.[2] The only statistic close was the <math>W^2</math> (Cramér von-Mises test) statistic.
Procedure
(If testing for normal distribution of the variable X)
1) The data of the variable X that should be tested is sorted from low to high.
2) The mean, <math>\bar{X}</math>, and standard deviation, <math>s</math>, are calculated from the sample of X.
3) The values of X are standardized as follows:
- <math>Y_i=\frac{X_i-\bar{X}}{s}</math>
4) With the standard normal CDF <math>\Phi</math>, <math>A^2</math> is calculated using:
- <math>A^2 = -n -\frac{1}{n} \sum_{i=1}^n (2i-1)(\ln \Phi(Y_i)+ \ln(1-\Phi(Y_{n+1-i}))).</math>
5) <math>A^{2*}</math>, an approximate adjustment for sample size, is calculated using:
- <math>A^{2*}=A^2\left(1+\frac{0.75}{n}+\frac{2.25}{n^2}\right)</math>
6) If <math>A^{2*}</math> exceeds 0.752 then the hypothesis of normality is rejected for a 5% level test.
Note:
1. If s = 0 or any <math>P_i=</math>(0 or 1) then <math>A^2</math> cannot be calculated and is undefined.
2. Above, it was assumed that the variable <math>X_i</math> was being tested for normal distribution. Any other theoretical distribution can be assumed by using its CDF. Each theoretical distribution has its own critical values, and some examples are: lognormal, exponential, Weibull, extreme value type I and logistic distribution.
3. Null hypothesis follows the true distribution (in this case, N(0, 1)).
See also
External links
References
- ↑ Anderson, T. W. (1952). "Asymptotic theory of certain "goodness-of-fit" criteria based on stochastic processes". Annals of Mathematical Statistics. 23: 193–212. Unknown parameter
|coauthors=
ignored (help); Unknown parameter|author link=
ignored (|author-link=
suggested) (help) - ↑ Stephens, M. A. (1974). "EDF Statistics for Goodness of Fit and Some Comparisons". Journal of the American Statistical Association. 69: 730–737.