Editing Kolmogorov–Smirnov test (section)

{{Short description|Statistical test comparing two probability distributions}}
{{CS1 config|mode=cs1}}
[[File:KS Example.png|thumb|300px|Illustration of the Kolmogorov–Smirnov statistic. The red line is a model [[Cumulative distribution function|CDF]], the blue line is an [[Empirical distribution function| empirical CDF]], and the black arrow is the KS statistic.]]

In [[statistics]], the '''Kolmogorov–Smirnov test''' (also '''K–S test''' or '''KS test''') is a [[nonparametric statistics|nonparametric test]] of the equality of continuous (or discontinuous, see [[#Discrete and mixed null distribution|Section 2.2]]), one-dimensional [[probability distribution]]s. It can be used to test whether a [[random sample|sample]] came from a given reference probability distribution (one-sample K–S test), or to test whether two samples came from the same distribution (two-sample K–S test). Intuitively, it provides a method to qualitatively <!-- Quantitively, surely? --> answer the question "How likely is it that we would see a collection of samples like this if they were drawn from that probability distribution?" or, in the second case, "How likely is it that we would see two sets of samples like this if they were drawn from the same (but unknown) probability distribution?".
It is named after  [[Andrey Kolmogorov]] and [[Nikolai Smirnov (mathematician)|Nikolai Smirnov]].

The Kolmogorov–Smirnov statistic quantifies a [[metric (mathematics)|distance]] between the [[empirical distribution function]] of the sample and the [[cumulative distribution function]] of the reference distribution, or between the empirical distribution functions of two samples. The [[null distribution]] of this statistic is calculated under the [[null hypothesis]] that the sample is drawn from the reference distribution (in the one-sample case) or that the samples are drawn from the same distribution (in the two-sample case). In the one-sample case, the distribution considered under the null hypothesis may be continuous (see [[#Kolmogorov distribution|Section 2]]), purely discrete or mixed (see [[#Discrete and mixed null distribution|Section 2.2]]). In the two-sample case (see [[#Two-sample Kolmogorov–Smirnov test|Section 3]]), the distribution considered under the null hypothesis is a continuous distribution but is otherwise unrestricted. 

The two-sample K–S test is one of the most useful and general [[Nonparametric statistics|nonparametric methods]] for comparing two samples, as it is sensitive to differences in both location and shape of the empirical cumulative distribution functions of the two samples.

The Kolmogorov–Smirnov test can be modified to serve as a [[goodness of fit]] test. In the special case of testing for [[Normal distribution|normality]] of the distribution, samples are standardized and compared with a standard normal distribution. This is equivalent to setting the mean and variance of the reference distribution equal to the sample estimates, and it is known that using these to define the specific reference distribution changes the null distribution of the test statistic (see [[#Test with estimated parameters|Test with estimated parameters]]). Various studies have found that, even in this corrected form, the test is less [[Power_of_a_test|powerful]] for testing normality than the [[Shapiro–Wilk test]] or [[Anderson–Darling test]].<ref>{{cite journal
 | first = M. A. | last = Stephens | year = 1974 | title = EDF Statistics for Goodness of Fit and Some Comparisons
 | journal = Journal of the American Statistical Association
 | volume = 69 | issue = 347| pages = 730–737 | jstor =2286009
 | doi = 10.2307/2286009
 }}</ref> However, these other tests have their own disadvantages. For instance the Shapiro–Wilk test is known not to work well in samples with many identical values.