Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Data dredging
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
=== Optional stopping === [[File:P-hacking by early stopping.svg|thumb|315x315px|The figure shows the change in p-values computed from a t-test as the sample size increases, and how early stopping can allow for p-hacking. Data is drawn from two identical normal distributions, <math>N(0, 10)</math>. For each sample size <math>n</math>, ranging from 5 to <math>10^4</math>, a t-test is performed on the first <math>n</math> samples from each distribution, and the resulting p-value is plotted. The red dashed line indicates the commonly used significance level of 0.05. If the data collection or analysis were to stop at a point where the p-value happened to fall below the significance level, a spurious statistically significant difference could be reported.]] Optional stopping is a practice where one collects data until some stopping criteria is reached. While it is a valid procedure, it is easily misused. The problem is that p-value of an optionally stopped statistical test is larger than what it seems. Intuitively, this is because the p-value is supposed to be the sum of all events at least as rare as what is observed. With optional stopping, there are even rarer events that are difficult to account for, i.e. not triggering the optional stopping rule, and collect even more data, before stopping. Neglecting these events leads to a p-value that's too low. In fact, if the null hypothesis is true, then ''any'' significance level can be reached if one is allowed to keep collecting data and stop when the desired p-value (calculated as if one has always been planning to collect exactly this much data) is obtained.<ref name=":9">{{Cite journal |last=Wagenmakers |first=Eric-Jan |date=October 2007 |title=A practical solution to the pervasive problems of p values |url=http://link.springer.com/10.3758/BF03194105 |journal=Psychonomic Bulletin & Review |language=en |volume=14 |issue=5 |pages=779β804 |doi=10.3758/BF03194105 |issn=1069-9384 |pmid=18087943}}</ref> For a concrete example of testing for a fair coin, see {{section link|P-value|Optional stopping|display=''p''-value}}. Or, more succinctly, the proper calculation of p-value requires accounting for counterfactuals, that is, what the experimenter ''could'' have done in reaction to data that ''might'' have been. Accounting for what might have been is hard, even for honest researchers.<ref name=":9" /> One benefit of preregistration is to account for all counterfactuals, allowing the p-value to be calculated correctly.<ref>{{Cite journal |last1=Wicherts |first1=Jelte M. |last2=Veldkamp |first2=Coosje L. S. |last3=Augusteijn |first3=Hilde E. M. |last4=Bakker |first4=Marjan |last5=van Aert |first5=Robbie C. M. |last6=van Assen |first6=Marcel A. L. M. |date=2016-11-25 |title=Degrees of Freedom in Planning, Running, Analyzing, and Reporting Psychological Studies: A Checklist to Avoid p-Hacking |journal=Frontiers in Psychology |volume=7 |page=1832 |doi=10.3389/fpsyg.2016.01832 |issn=1664-1078 |pmc=5122713 |pmid=27933012 |doi-access=free}}</ref> The problem of early stopping is not just limited to researcher misconduct. There is often pressure to stop early if the cost of collecting data is high. Some animal ethics boards even mandate early stopping if the study obtains a significant result midway.<ref name="mlh">{{Cite journal |last1=Head |first1=Megan L. |last2=Holman |first2=Luke |last3=Lanfear |first3=Rob |last4=Kahn |first4=Andrew T. |last5=Jennions |first5=Michael D. |date=2015-03-13 |title=The Extent and Consequences of P-Hacking in Science |journal=PLOS Biology |language=en |volume=13 |issue=3 |pages=e1002106 |doi=10.1371/journal.pbio.1002106 |issn=1545-7885 |pmc=4359000 |pmid=25768323 |doi-access=free}}</ref>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)