Editing Power (statistics) (section)

==Discussion==
The statistical power of a hypothesis test has an impact on the interpretation of its results. Not finding a result with a more powerful study is stronger evidence against the effect existing than the same finding with a less powerful study. However, this is not completely conclusive. The effect may exist, but be smaller than what was looked for, meaning the study is in fact underpowered and the sample is thus unable to distinguish it from random chance.<ref>{{cite book |title=The Essential Guide to Effect Sizes: Statistical Power, Meta-Analysis, and the Interpretation of Research Results |last=Ellis |first=Paul |year=2010 |isbn=978-0521142465 |publisher=Cambridge University Press |page=52}}</ref> Many [[clinical trial]]s, for instance, have low statistical power to detect differences in [[adverse effect]]s of treatments, since such effects may only affect a few patients, even if this difference can be [[clinical significance|important]].<ref>{{cite journal |last1=Tsang |first1=R. |last2=Colley |first2=L. |last3=Lynd |first3=L.D. |doi=10.1016/j.jclinepi.2008.08.005 |title=Inadequate statistical power to detect clinically significant differences in adverse event rates in randomized controlled trials |journal=Journal of Clinical Epidemiology |volume=62 |issue=6 |pages=609–616 |year=2009 |pmid=19013761}}</ref> Conclusions about the [[Posterior probability|probability of actual presence]] of an effect also should consider more things than a single test, especially as real world power is rarely close to 1.

Indeed, although there are no formal standards for power, many researchers and funding bodies assess power using 0.80 (or 80%) as a standard for adequacy. This convention implies a four-to-one trade off between {{mvar|β}}-risk and {{mvar|α}}-risk, as the probability of a type&nbsp;II error {{mvar|β}} is set as 1 - 0.8 = 0.2, while α, the probability of a type&nbsp;I error, is commonly set at 0.05. Some applications require much higher levels of power. [[Medical test]]s may be designed to minimise the number of false negatives (type&nbsp;II errors) produced by loosening the threshold of significance, raising the risk of obtaining a false positive (a type&nbsp;I error). The rationale is that it is better to tell a healthy patient "we may have found something—let's test further," than to tell a diseased patient "all is well."<ref>{{cite book |author=Ellis, Paul D. |title=The Essential Guide to Effect Sizes: An Introduction to Statistical Power, Meta-Analysis and the Interpretation of Research Results |publisher=Cambridge University Press |location=United Kingdom |year=2010|page = 56}}</ref>

Power analysis focuses on the correct rejection of a null hypothesis. Alternative concerns may however motivate an experiment, and so lead to different needs for sample size. In many contexts, the issue is less about deciding between hypotheses but rather with getting an [[estimation theory|estimate]] of the population effect size of sufficient accuracy. For example, a careful power analysis can tell you that 55 pairs of normally distributed samples with a [[Pearson product-moment correlation coefficient|correlation]] of 0.5 will be sufficient to grant 80% power in rejecting a null that the correlation is no more than 0.2 (using a one-sided test, {{mvar|α}}&nbsp;=&nbsp;0.05). But the typical 95% [[confidence interval]] with this sample would be around [0.27, 0.67]. An alternative, albeit related analysis would be required if we wish to be able to measure correlation to an accuracy of +/- 0.1, implying a different (in this case, larger) sample size. Alternatively, multiple under-powered studies can still be useful, if appropriately combined through a [[meta-analysis]].

Many statistical analyses involve the estimation of several unknown quantities. In simple cases, all but one of these quantities are [[nuisance parameter]]s. In this setting, the only relevant power pertains to the single quantity that will undergo formal statistical inference. In some settings, particularly if the goals are more "exploratory", there may be a number of quantities of interest in the analysis. For example, in a multiple [[regression analysis]] we may include several covariates of potential interest. In situations such as this where several hypotheses are under consideration, it is common that the powers associated with the different hypotheses differ. For instance, in multiple regression analysis, the power for detecting an effect of a given size is related to the variance of the covariate. Since different covariates will have different variances, their powers will differ as well.

Additional complications arise when we consider these [[multiple comparisons|multiple hypotheses]] together. For example, if we consider a false positive to be making an erroneous null rejection on any one of these hypotheses, our likelihood of this [[Family-wise error rate|"family-wise error"]] will be inflated if appropriate measures are not taken. Such measures typically involve applying a higher threshold of stringency to reject a hypothesis (such as with the [[Bonferroni method]]), and so would reduce power. Alternatively, there may be different notions of power connected with how the different hypotheses are considered. "Complete power" demands that all true effects are detected across all of the hypotheses, which is a much stronger requirement than the "minimal power" of being able to find at least one true effect, a type of power that might increase with an increasing number of hypotheses.<ref>{{cite web|title= Estimating Statistical Power When Using Multiple Testing Procedures|url=https://www.mdrc.org/work/publications/estimating-statistical-power-when-using-multiple-testing-procedures|website=mdrc.org|date=November 2017 }}</ref>