Editing P-value (section)

== Definition and interpretation ==

=== Definition ===

The ''p''-value is the probability under the null hypothesis of obtaining a real-valued test statistic at least as extreme as the one obtained. Consider an observed test-statistic <math>t</math> from unknown distribution <math>T</math>. Then the ''p''-value <math>p</math> is what the prior probability would be of observing a test-statistic value at least as "extreme" as <math>t</math> if null hypothesis <math>H_0</math> were true. That is:
* <math>p = \Pr(T \geq t \mid H_0)</math> for a one-sided right-tail test-statistic distribution.
* <math>p = \Pr(T \leq t \mid H_0)</math> for a one-sided left-tail test-statistic distribution.
* <math>p = 2\min\{\Pr(T \geq t \mid H_0),\Pr(T \leq t \mid H_0)\} </math> for a two-sided test-statistic distribution. If the distribution of <math>T</math> is symmetric about zero, then <math>p = \Pr(|T| \geq |t| \mid H_0).</math>

=== Interpretations ===

{{blockquote |text=The error that a practising statistician would consider the more important to avoid (which is a subjective judgment) is called the error of the first kind. The first demand of the mathematical theory is to deduce such test criteria as would ensure that the probability of committing an error of the first kind would equal (or approximately equal, or not exceed) a preassigned number α, such as α = 0.05 or 0.01, etc. This number is called the level of significance. |author=Jerzy Neyman |source="The Emergence of Mathematical Statistics"<ref name="Neyman1976">{{cite book | chapter = The Emergence of Mathematical Statistics: A Historical Sketch with Particular Reference to the United States | title = On the History of Statistics and Probability | page = 161 | year = 1976 | last = Neyman | first = Jerzy | author-link = Jerzy Neyman | place = New York | publisher = Marcel Dekker Inc | editor-last = Owen | editor-first = D.B. | series = Textbooks and Monographs | url = https://openlibrary.org/works/OL18334563W/On_the_history_of_statistics_and_probability?edition=key%3A/books/OL5206547M}}</ref>}}

In a significance test, the null hypothesis <math>H_0</math> is rejected if the ''p''-value is less than or equal to a predefined threshold value [[Alpha|<math>\alpha</math>]], which is referred to as the alpha level or [[statistical significance|significance level]]. <math>\alpha</math> is not derived from the data, but rather is set by the researcher before examining the data. <math>\alpha</math> is commonly set to 0.05, though lower alpha levels are sometimes used. The 0.05 value (equivalent to 1/20 chances) was originally proposed by R. [[Ronald Fisher|Fisher]] in 1925 in his famous book entitled "[[Statistical Methods for Research Workers]]".<ref>{{Citation |last=Fisher |first=R. A. |title=Statistical Methods for Research Workers |date=1992 |work=Breakthroughs in Statistics: Methodology and Distribution |series=Springer Series in Statistics |pages=66–70 |editor-last=Kotz |editor-first=Samuel |url=https://doi.org/10.1007/978-1-4612-4380-9_6 |access-date=2024-07-07 |place=New York, NY |publisher=Springer |language=en |doi=10.1007/978-1-4612-4380-9_6 |isbn=978-1-4612-4380-9 |editor2-last=Johnson |editor2-first=Norman L.}}</ref> 

Different ''p''-values based on independent sets of data can be combined, for instance using [[Fisher's combined probability test]].

=== Distribution ===

The ''p''-value is a function of the chosen test statistic <math>T</math> and is therefore a [[random variable]]. If the null hypothesis fixes the probability distribution of <math>T</math> precisely (e.g. <math>H_0: \theta = \theta_0,</math> where <math>\theta</math> is the only parameter), and if that distribution is continuous, then when the null-hypothesis is true, the ''p''-value is [[Uniform distribution (continuous)|uniformly distributed]] between 0 and 1. Regardless of the truth of the <math>H_0</math>, the ''p''-value is not fixed; if the same test is repeated independently with fresh data, one will typically obtain a different ''p''-value in each iteration.

Usually only a single ''p''-value relating to a hypothesis is observed, so the ''p''-value is interpreted by a significance test, and no effort is made to estimate the distribution it was drawn from. When a collection of ''p''-values are available (e.g. when considering a group of studies on the same subject), the distribution of ''p''-values is sometimes called a ''p''-curve.<ref name="Head2015">{{cite journal | vauthors = Head ML, Holman L, Lanfear R, Kahn AT, Jennions MD | title = The extent and consequences of p-hacking in science | journal = PLOS Biology | volume = 13 | issue = 3 | pages = e1002106 | date = March 2015 | pmid = 25768323 | pmc = 4359000 | doi = 10.1371/journal.pbio.1002106 | doi-access = free }}</ref>
A ''p''-curve can be used to assess the reliability of scientific literature, such as by detecting publication bias or [[p-hacking|''p''-hacking]].
<ref name="Head2015"/><ref name="Simonsohn2014">{{cite journal | vauthors = Simonsohn U, Nelson LD, Simmons JP | title = ''p''-Curve and Effect Size: Correcting for Publication Bias Using Only Significant Results | journal = Perspectives on Psychological Science | volume = 9 | issue = 6 | pages = 666–681 | date = November 2014 | pmid = 26186117 | doi = 10.1177/1745691614553988 | s2cid = 39975518 }}</ref>

=== Distribution for composite hypothesis===

In parametric hypothesis testing problems, a ''simple or point hypothesis'' refers to a hypothesis where the parameter's value is assumed to be a single number. In contrast, in a ''composite hypothesis'' the parameter's value is given by a set of numbers. When the null-hypothesis is composite (or the distribution of the statistic is discrete), then when the null-hypothesis is true the probability of obtaining a ''p''-value less than or equal to any number between 0 and 1 is still less than or equal to that number. In other words, it remains the case that very small values are relatively unlikely if the null-hypothesis is true, and that a significance test at level <math>\alpha</math> is obtained by rejecting the null-hypothesis if the ''p''-value is less than or equal to <math>\alpha</math>.<ref name="Bhattacharya2002">{{cite journal | vauthors = Bhattacharya B, Habtzghi D |s2cid = 33812107 |year = 2002 |title = Median of the p value under the alternative hypothesis |journal = The American Statistician |volume = 56 |issue = 3 |pages = 202–6 |doi = 10.1198/000313002146 }}</ref><ref name="Hung1997">{{cite journal | vauthors = Hung HM, O'Neill RT, Bauer P, Köhne K | title = The behavior of the P-value when the alternative hypothesis is true | journal = Biometrics | volume = 53 | issue = 1 | pages = 11–22 | date = March 1997 | pmid = 9147587 | doi = 10.2307/2533093 | type = Submitted manuscript | jstor = 2533093 | url = https://zenodo.org/record/1235121 }}</ref>

For example, when testing the null hypothesis that a distribution is normal with a mean less than or equal to zero against the alternative that the mean is greater than zero (<math>H_0: \mu \leq 0</math>, variance known), the null hypothesis does not specify the exact probability distribution of the appropriate test statistic. In this example that would be the [[Standard score|''Z''-statistic]] belonging to the one-sided one-sample ''Z''-test. For each possible value of the theoretical mean, the ''Z''-test statistic has a different probability distribution. In these circumstances  the ''p''-value is defined by taking the least favorable null-hypothesis case, which is typically on the border between null and alternative. 
This definition ensures the complementarity of p-values and alpha-levels: <math>\alpha = 0.05</math> means one only rejects the null hypothesis if the ''p''-value is less than or equal to <math>0.05</math>, and the hypothesis test will indeed have a ''maximum'' type-1 error rate of <math>0.05</math>.