Editing Effect size (section)

==Overview==

===Population and sample effect sizes===
As in [[statistical estimation]], the true effect size is distinguished from the observed effect size. For example, to measure the risk of disease in a population (the population effect size) one can measure the risk within a sample of that population (the sample effect size). Conventions for describing true and observed effect sizes follow standard statistical practices—one common approach is to use Greek letters like ρ [rho] to denote population parameters and Latin letters like ''r'' to denote the corresponding statistic. Alternatively, a "hat" can be placed over the population parameter to denote the statistic, e.g. with <math>\hat\rho</math> being the estimate of the parameter <math>\rho</math>.

As in any statistical setting, effect sizes are estimated with [[sampling error]], and may be biased unless the effect size estimator that is used is appropriate for the manner in which the data were [[sampling (statistics)|sampled]] and the manner in which the measurements were made.  An example of this is [[publication bias]], which occurs when scientists report results only when the estimated effect sizes are large or are statistically significant.  As a result, if many researchers carry out studies with low statistical power, the reported effect sizes will tend to be larger than the true (population) effects, if any.<ref name="Brand2008">{{Cite journal | vauthors = Brand A, Bradley MT, Best LA, Stoica G | year = 2008 | title = Accuracy of effect size estimates from published psychological research | journal = [[Perceptual and Motor Skills]] | volume = 106 | issue = 2 | pages = 645–649 | doi = 10.2466/PMS.106.2.645-649 | url = http://mtbradley.com/brandbradelybeststoicapdf.pdf | pmid = 18556917 | s2cid = 14340449 | access-date = 2008-10-31 | archive-url = https://web.archive.org/web/20081217175012/http://mtbradley.com/brandbradelybeststoicapdf.pdf | archive-date = 2008-12-17 | url-status=dead }}</ref> Another example where effect sizes may be distorted is in a multiple-trial experiment, where the effect size calculation is based on the averaged or aggregated response across the trials.<ref name="Brand2011">{{Cite journal |vauthors=Brand A, Bradley MT, Best LA, Stoica G | year = 2011 | title = Multiple trials may yield exaggerated effect size estimates | journal = [[The Journal of General Psychology]] | volume = 138 | issue = 1 | pages = 1–11 | doi=10.1080/00221309.2010.520360 | pmid = 21404946 | s2cid = 932324 | url = http://www.ipsychexpts.com/brand_et_al_(2011).pdf| archive-url = https://web.archive.org/web/20110713053244/http://www.ipsychexpts.com/brand_et_al_(2011).pdf| url-status = usurped| archive-date = July 13, 2011}}</ref>

Smaller studies sometimes show different, often larger, effect sizes than larger studies. This phenomenon is known as the small-study effect, which may signal publication bias.<ref>{{Cite journal |last1=Sterne |first1=Jonathan A. C. |last2=Gavaghan |first2=David |last3=Egger |first3=Matthias |date=2000-11-01 |title=Publication and related bias in meta-analysis: Power of statistical tests and prevalence in the literature |url=https://www.jclinepi.com/article/S0895-4356(00)00242-0/abstract |journal=Journal of Clinical Epidemiology |language=English |volume=53 |issue=11 |pages=1119–1129 |doi=10.1016/S0895-4356(00)00242-0 |issn=0895-4356 |pmid=11106885|url-access=subscription }}</ref>

===Relationship to test statistics===
Sample-based effect sizes are distinguished from [[test statistic]]s used in hypothesis testing, in that they estimate the strength (magnitude) of, for example, an apparent relationship, rather than assigning a [[statistical significance|significance]] level reflecting whether the magnitude of the relationship observed could be due to chance.  The effect size does not directly determine the significance level, or vice versa.  Given a sufficiently large sample size, a non-null statistical comparison will always show a statistically significant result unless the population effect size is exactly zero (and even there it will show statistical significance at the rate of the Type I error used).  For example, a sample [[Pearson correlation]] coefficient of 0.01 is statistically significant if the sample size is 1000.  Reporting only the significant [[p-value|''p''-value]] from this analysis could be misleading if a correlation of 0.01 is too small to be of interest in a particular application.

===Standardized and unstandardized effect sizes===
The term ''effect size'' can refer to a standardized measure of effect (such as ''r'', [[Cohen's d|Cohen's ''d'']], or the [[odds ratio]]), or to an unstandardized measure (e.g., the difference between group means or the unstandardized regression coefficients). Standardized effect size measures are typically used when:
* the metrics of variables being studied do not have intrinsic meaning (e.g., a score on a personality test on an arbitrary scale),
* results from multiple studies are being combined,
* some or all of the studies use different scales, or
* it is desired to convey the size of an effect relative to the variability in the population.
In meta-analyses, standardized effect sizes are used as a common measure that can be calculated for different studies and then combined into an overall summary.