Editing Power (statistics) (section)

==Example==
The following is an example that shows how to compute power for a randomized experiment: Suppose the goal of an experiment is to study the effect of a treatment on some quantity, and so we shall compare research subjects by measuring the quantity before and after the treatment, analyzing the data using a one-sided [[Paired difference test|paired]] [[t-test]], with a significance level threshold of 0.05. We are interested in being able to detect a positive change of size <math>\theta > 0</math>.

We first set up the problem according to our test. Let <math>A_i</math> and <math>B_i</math> denote the pre-treatment and post-treatment measures on subject <math>i</math>, respectively. The possible effect of the treatment should be visible in the differences <math>D_i = B_i-A_i,</math> which are assumed to be independent and identically [[normal distribution|Normal]] in distribution, with unknown mean value <math>\mu_D</math> and variance <math>\sigma_D^2</math>.

Here, it is natural to choose our null hypothesis to be that the expected mean difference is zero, i.e. <math>H_0: \mu_D =\mu_0= 0.</math> For our one-sided test, the alternative hypothesis would be that there is a positive effect, corresponding to <math>H_1: \mu_D = \theta > 0.</math> The [[test statistic]] in this case is defined as:

<math display="block">T_n=\frac{\bar{D}_n-\mu_0 }{\hat{\sigma}_D/\sqrt{n}} =\frac{\bar{D}_n-0}{\hat{\sigma}_D/\sqrt{n}},</math>

where <math>\mu_0</math> is the mean under the null so we substitute in 0, {{mvar|n}} is the sample size (number of subjects), <math>\bar{D}_n</math> is the [[sample mean]] of the difference

<math display="block">\bar{D}_n=\frac{1}{n}\sum_{i=1}^n D_i,</math>

and <math>\hat{\sigma}_D</math> is the sample [[standard deviation]] of the difference.

=== Analytic solution ===

We can proceed according to our knowledge of statistical theory, though in practice for a standard case like this software will exist to compute more accurate answers.

Thanks to t-test theory, we know this test statistic under the null hypothesis follows a [[Student t-distribution]] with <math>n-1</math> degrees of freedom. If we wish to reject the null at significance level <math>\alpha = 0.05\,</math>, we must find the [[Critical value (statistics)|critical value]] <math>t_{\alpha}</math> such that the probability of <math>T_n > t_{\alpha}</math> under the null is equal to <math>\alpha</math>. If {{mvar|n}} is large, the t-distribution converges to the standard normal distribution (thus no longer involving {{mvar|n}}) and so through use of the [[Normal_distribution#Quantile_function|corresponding]] [[quantile function]] <math>\Phi^{-1}</math>, we obtain that the null should be rejected if

<math display="block">T_n > t_{\alpha} \approx \Phi^{-1}(0.95) \approx 1.64\,.</math>

Now suppose that the alternative hypothesis <math>H_1</math> is true so <math>\mu_D = \theta</math>. Then, writing the power as a function of the effect size, <math>B(\theta)</math>, we find the probability of <math>T_n</math> being above <math>t_{\alpha}</math> under <math>H_1</math>.

<math display="block">\begin{align}
B(\theta) &\approx \Pr \left( T_n > 1.64 ~\big|~ \mu_D = \theta \right) \\
&= \Pr \left( \frac{\bar{D}_n-0}{\hat{\sigma}_D/\sqrt{n} } > 1.64 ~\Big|~ \mu_D = \theta \right) \\
&= 1- \Pr \left( \frac{\bar{D}_n-0}{\hat{\sigma}_D/\sqrt{n} } < 1.64 ~\Big|~ \mu_D = \theta \right) \\
&= 1 - \Pr\left( \frac{\bar{D}_n-\theta}{\hat{\sigma}_D/\sqrt{n}} < 1.64 - \frac \theta {\hat{\sigma}_D/\sqrt{n}} ~\Big|~ \mu_D=\theta \right)\\
\end{align}</math>

<math>\frac{\bar{D}_n - \theta}{\hat{\sigma}_D/\sqrt{n}}</math> again follows a student-t distribution under <math>H_1</math>, converging on to a standard [[normal distribution]] for large {{mvar|n}}. The estimated <math>{\hat{\sigma}}_D</math> will also converge on to its population value <math>\sigma_D</math> Thus power can be approximated as

<math display="block">B(\theta) \approx 1 - \Phi \left( 1.64 - \frac{\theta}{\sigma_D/\sqrt{n}} \right). </math>

According to this formula, the power increases with the values of the effect size <math>\theta</math> and the sample size {{mvar|n}}, and reduces with increasing variability <math>\sigma_D</math>. In the trivial case of zero effect size, power is at a minimum ([[infimum]]) and equal to the significance level of the test <math>\alpha\,,</math> in this example 0.05. For finite sample sizes and non-zero variability, it is the case here, as is typical, that power cannot be made equal to 1 except in the trivial case where <math>\alpha = 1</math> so the null is ''always'' rejected.

We can invert <math>B</math> to obtain required sample sizes:

<math display="block">\sqrt{n} > \frac{\sigma_D}{\theta}\left( 1.64- \Phi^{-1} \left( 1- B(\theta)\right) \right).</math>

Suppose <math>\theta = 1</math> and we believe <math>\sigma_D</math> is around 2, say, then we require for a power of <math> B(\theta) = 0.8</math>, a sample size

<math display="block">n > 4 \left( 1.64- \Phi^{-1} \left( 1- 0.8\right) \right)^2 \approx 4 \left( 1.64+0.84\right)^2 \approx 24.6 .</math>

=== Simulation solution ===

Alternatively we can use a [[Monte Carlo simulation]] method that works more generally.<ref>{{cite conference|url=https://support.sas.com/resources/papers/proceedings/proceedings/sugi24/Posters/p236-24.pdf|last=Graebner|first=Robert W.|title=Study design with SAS: Estimating power with Monte Carlo methods|year=1999|conference=SUGI 24}}</ref> Once again, we return to the assumption of the distribution of <math>D_n</math> and the definition of <math>T_n</math>. Suppose we have fixed values of the sample size, variability and effect size, and wish to compute power. We can adopt this process:

1. Generate a large number of sets of <math>D_n</math> according to the null hypothesis, <math>N(0, \sigma_D)</math>

2. Compute the resulting test statistic <math>T_n</math> for each set.

3. Compute the <math>(1-\alpha)</math>th quantile of the simulated <math>T_n</math> and use that as an estimate of <math>t_\alpha</math>.

4. Now generate a large number of sets of <math>D_n</math> according to the alternative hypothesis, <math>N(\theta, \sigma_D)</math>, and compute the corresponding test statistics again.

5. Look at the proportion of these simulated alternative <math>T_n</math> that are above the <math>t_\alpha</math> calculated in step 3 and so are rejected. This is the power.

This can be done with a variety of software packages. Using this methodology with the values before, setting the sample size to 25 leads to an estimated power of around 0.78. The small discrepancy with the previous section is due mainly to inaccuracies with the normal approximation.