Editing P-value (section)

=== Testing the fairness of a coin ===
As an example of a statistical test, an experiment is performed to determine whether a [[coin flipping|coin flip]] is [[fair coin|fair]] (equal chance of landing heads or tails) or unfairly biased (one outcome being more likely than the other).

Suppose that the experimental results show the coin turning up heads 14 times out of 20 total flips. The full data <math>X</math> would be a sequence of twenty times the symbol "H" or "T". The statistic on which one might focus could be the total number <math>T</math> of heads. The null hypothesis is that the coin is fair, and coin tosses are independent of one another. If a right-tailed test is considered, which would be the case if one is actually interested in the possibility that the coin is biased towards falling heads, then the ''p''-value of this result is the chance of a fair coin landing on heads ''at least'' 14 times out of 20 flips. That probability can be computed from [[binomial coefficient]]s as

: <math>
\begin{align}
& \Pr(14\text{ heads}) + \Pr(15\text{ heads}) + \cdots + \Pr(20\text{ heads}) \\
& = \frac{1}{2^{20}} \left[ \binom{20}{14} + \binom{20}{15} + \cdots + \binom{20}{20} \right] = \frac{60\,460}{1\,048\,576} \approx 0.058.
\end{align}
</math>

This probability is the ''p''-value, considering only extreme results that favor heads. This is called a [[One- and two-tailed tests|one-tailed test]]. However, one might be interested in deviations in either direction, favoring either heads or tails. The two-tailed ''p''-value, which considers deviations favoring either heads or tails, may instead be calculated. As the [[binomial distribution]] is symmetrical for a fair coin, the two-sided ''p''-value is simply twice the above calculated single-sided ''p''-value: the two-sided ''p''-value is 0.115.

In the above example:
* Null hypothesis (''H''<sub>0</sub>): The coin is fair, with Pr(heads) = 0.5.
* Test statistic: Number of heads.
* Alpha level (designated threshold of significance): 0.05.
* Observation ''O'': 14 heads out of 20 flips.
* Two-tailed ''p''-value of observation ''O'' given ''H''<sub>0</sub> = 2 × min(Pr(no. of heads ≥&nbsp;14&nbsp;heads), Pr(no. of heads ≤&nbsp;14&nbsp;heads)) = 2 × min(0.058, 0.978) = 2 × 0.058 = 0.115.<!-- Note we've summed the exact values, not the rounded values.  Correct value = 0.1153183... -->

The Pr(no. of heads ≤&nbsp;14&nbsp;heads) = 1 − Pr(no. of heads ≥&nbsp;14&nbsp;heads) + Pr(no. of head = 14) = 1 − 0.058 + 0.036 = 0.978; however, the symmetry of this binomial distribution makes it an unnecessary computation to find the smaller of the two probabilities. Here, the calculated ''p''-value exceeds 0.05, meaning that the data falls within the range of what would happen 95% of the time, if the coin were fair. Hence, the null hypothesis is not rejected at the 0.05 level.

However, had one more head been obtained, the resulting ''p''-value (two-tailed) would have been 0.0414&nbsp;(4.14%), in which case the null hypothesis would be rejected at the 0.05 level.

==== Optional stopping ====
{{Anchor|Optional stopping}}

The difference between the two meanings of "extreme" appear when we consider a sequential hypothesis testing, or optional stopping, for the fairness of the coin. In general, optional stopping changes how p-value is calculated.<ref>{{Cite journal |last=Goodman |first=Steven |date=2008-07-01 |title=A Dirty Dozen: Twelve P-Value Misconceptions |url=https://www.sciencedirect.com/science/article/pii/S0037196308000620 |journal=Seminars in Hematology |series=Interpretation of Quantitative Research |volume=45 |issue=3 |pages=135–140 |doi=10.1053/j.seminhematol.2008.04.003 |pmid=18582619 |issn=0037-1963}}</ref><ref>{{Cite journal |last=Wagenmakers |first=Eric-Jan |date=October 2007 |title=A practical solution to the pervasive problems of p values |url=http://link.springer.com/10.3758/BF03194105 |journal=Psychonomic Bulletin & Review |language=en |volume=14 |issue=5 |pages=779–804 |doi=10.3758/BF03194105 |pmid=18087943 |issn=1069-9384}}</ref> Suppose we design the experiment as follows:
* Flip the coin twice. If both comes up heads or tails, end the experiment.
* Else, flip the coin 4 more times.

This experiment has 7 types of outcomes: 2&nbsp;heads, 2&nbsp;tails, 5&nbsp;heads 1&nbsp;tail,&nbsp;..., 1&nbsp;head 5&nbsp;tails. We now calculate the ''p''-value of the "3&nbsp;heads 3&nbsp;tails" outcome.

If we use the test statistic <math>\text{heads}/\text{tails}</math>, then under the null hypothesis is exactly 1 for two-sided ''p''-value, and exactly <math>19/32</math> for one-sided left-tail ''p''-value, and same for one-sided right-tail ''p''-value.

If we consider every outcome that has equal or lower probability than "3&nbsp;heads 3&nbsp;tails" as "at least as extreme", then the ''p''-value is exactly <math>1/2.</math>

However, suppose we have planned to simply flip the coin 6&nbsp;times no matter what happens, then the second definition of ''p''-value would mean that the ''p''-value of "3&nbsp;heads 3&nbsp;tails" is exactly 1.

Thus, the "at least as extreme" definition of ''p''-value is deeply contextual and depends on what the experimenter ''planned'' to do even in situations that did not occur.