Editing Statistical hypothesis test (section)

===Interpretation===
When the null hypothesis is true and statistical assumptions are met, the probability that the p-value will be less than or equal to the significance level <math>\alpha</math> is at most <math>\alpha</math>. This ensures that the hypothesis test maintains its specified false positive rate (provided that statistical assumptions are met).<ref name="LR" />

The ''p''-value is the probability that a test statistic which is at least as extreme as the one obtained would occur under the null hypothesis. At a significance level of 0.05, a fair coin would be expected to (incorrectly) reject the null hypothesis (that it is fair) in 1 out of 20 tests on average. The ''p''-value does not provide the probability that either the null hypothesis or its opposite is correct (a common source of confusion).<ref>{{Cite journal|last=Nuzzo|first=Regina|author-link= Regina Nuzzo |date=2014|title=Scientific method: Statistical errors|journal=Nature|volume=506|issue=7487|pages=150–152|bibcode=2014Natur.506..150N|doi=10.1038/506150a|pmid=24522584|doi-access=free}}</ref>

If the ''p''-value is less than the chosen significance threshold (equivalently, if the observed test statistic is in the critical region), then we say the null hypothesis is rejected at the chosen level of significance. If the ''p''-value is ''not'' less than the chosen significance threshold (equivalently, if the observed test statistic is outside the critical region), then the null hypothesis is not rejected at the chosen level of significance.

In the "lady tasting tea" example (below), Fisher required the lady to properly categorize all of the cups of tea to justify the conclusion that the result was unlikely to result from chance. His test revealed that if the lady was effectively guessing at random (the null hypothesis), there was a 1.4% chance that the observed results (perfectly ordered tea) would occur.