Editing Biostatistics (section)

== Statistical considerations ==

=== Power and statistical error ===

When testing a hypothesis, there are two types of statistic errors possible: [[Type I error]] and [[Type II error]]. 

* The type I error or [[False positives and false negatives|false positive]] is the incorrect rejection of a true null hypothesis
* The type II error or [[False positives and false negatives|false negative]] is the failure to reject a false [[null hypothesis]]. 

The [[significance level]] denoted by α is the type I error rate and should be chosen before performing the test. The type II error rate is denoted by β and [[Statistical power|statistical power of the test]] is 1 − β.

=== p-value ===

The [[p-value]] is the probability of obtaining results as extreme as or more extreme than those observed, assuming the [[null hypothesis]] (H<sub>0</sub>) is true. It is also called the calculated probability. It is common to confuse the p-value with the [[Statistical significance|significance level (α)]], but, the α is a predefined threshold for calling significant results. If p is less than α, the null hypothesis (H<sub>0</sub>) is rejected.<ref>{{cite journal|doi=10.1038/nature.2016.19503|pmid=26961635|title=Statisticians issue warning over misuse of P values|journal=Nature|volume=531|issue=7593|pages=151|year=2016|last1=Baker|first1=Monya|bibcode=2016Natur.531..151B|doi-access=free}}</ref>

=== Multiple testing ===

In multiple tests of the same hypothesis, the probability of the occurrence of [[False positives and false negatives|false positives]] [[Family-wise error rate|(familywise error rate)]] increase and a strategy is needed to account for this occurrence. This is commonly achieved by using a more stringent threshold to reject null hypotheses. The [[Bonferroni correction]] defines an acceptable global significance level, denoted by α* and each test is individually compared with a value of α =  α*/m. This ensures that the familywise error rate in all m tests, is less than or equal to α*. When m is large, the Bonferroni correction may be overly conservative. An alternative to the Bonferroni correction is to control the [[False discovery rate|false discovery rate (FDR)]]. The FDR controls the expected proportion of the rejected [[Null hypothesis|null hypotheses]] (the so-called discoveries) that are false (incorrect rejections). This procedure ensures that, for independent tests, the false discovery rate is at most q*. Thus, the FDR is less conservative than the Bonferroni correction and have more power, at the cost of more false positives.<ref>Benjamini, Y. & Hochberg, Y. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society. Series B (Methodological) 57, 289–300 (1995).</ref>

=== Mis-specification and robustness checks ===

The main hypothesis being tested (e.g., no association between treatments and outcomes) is often accompanied by other technical assumptions (e.g., about the form of the probability distribution of the outcomes) that are also part of the null hypothesis. When the technical assumptions are violated in practice, then the null may be frequently rejected even if the main hypothesis is true. Such rejections are said to be due to model mis-specification.<ref>{{Cite web|url=https://www.statlect.com/glossary/null-hypothesis|title=Null hypothesis|website=www.statlect.com|access-date=2018-05-08}}</ref> Verifying whether the outcome of a statistical test does not change when the technical assumptions are slightly altered (so-called robustness checks) is the main way of combating mis-specification.

=== Model selection criteria ===

[[Model selection|Model criteria selection]] will select or model that more approximate true model. The [[Model selection|Akaike's Information Criterion (AIC)]] and The [[Model selection|Bayesian Information Criterion (BIC)]] are examples of asymptotically efficient criteria.