Editing Effect size (section)

===Categorical family: Effect sizes for associations among categorical variables===

{| class="wikitable" align="right" valign
|-
| align="center" |
&nbsp;<math>\varphi = \sqrt{ \frac{\chi^2}{N}}</math>&nbsp;
| align="center" |
&nbsp;<math>\varphi_c = \sqrt{ \frac{\chi^2}{N(k - 1)}}</math>&nbsp;
|-
! Phi (''φ'')
! Cramér's ''V'' (''φ''<sub>''c''</sub>)
|}
Commonly used measures of association for the [[chi-squared test]] are the [[Phi coefficient]] and [[Harald Cramér|Cramér]]'s [[Cramér's V (statistics)|V]] (sometimes referred to as Cramér's phi and denoted as ''φ''<sub>''c''</sub>). Phi is related to the [[point-biserial correlation coefficient]] and Cohen's ''d'' and estimates the extent of the relationship between two variables (2&nbsp;×&nbsp;2).<ref name="Ref_">Aaron, B., Kromrey, J. D., & Ferron, J. M. (1998, November). [http://www.eric.ed.gov/ERICWebPortal/custom/portlets/recordDetails/detailmini.jsp?_nfpb=true&_&ERICExtSearch_SearchValue_0=ED433353&ERICExtSearch_SearchType_0=no&accno=ED433353 Equating r-based and d-based effect-size indices: Problems with a commonly recommended formula.] Paper presented at the annual meeting of the Florida Educational Research Association, Orlando, FL. (ERIC Document Reproduction Service No. ED433353)</ref> Cramér's V may be used with variables having more than two levels.

Phi can be computed by finding the square root of the chi-squared statistic divided by the sample size.

Similarly, Cramér's V is computed by taking the square root of the chi-squared statistic divided by the sample size and the length of the minimum dimension (''k'' is the smaller of the number of rows ''r'' or columns&nbsp;''c'').

φ<sub>''c''</sub> is the intercorrelation of the two discrete variables<ref name="Ref_a">{{cite book | last=Sheskin|first=David J. | title=Handbook of Parametric and Nonparametric Statistical Procedures | url=https://books.google.com/books?id=bmwhcJqq01cC&pg=PP1 | edition=Third | year=2003 | publisher=CRC Press | isbn=978-1-4200-3626-8}}</ref> and may be computed for any value of ''r'' or ''c''. However, as chi-squared values tend to increase with the number of cells, the greater the difference between ''r'' and ''c'', the more likely V will tend to 1 without strong evidence of a meaningful correlation.

==== Cohen's omega (''ω'') ====

Another measure of effect size used for chi-squared tests is Cohen's omega (<math> \omega</math>). This is defined as
<math display="block"> \omega = \sqrt{ \sum_{i=1}^m \frac{ (p_{1i} - p_{0i})^2 }{p_{0i}} } </math>
where ''p''<sub>0''i''</sub> is the proportion of the ''i''<sup>th</sup> cell under ''H''<sub>0</sub>, ''p''<sub>1''i''</sub> is the proportion of the ''i''<sup>th</sup> cell under ''H''<sub>1</sub> and ''m'' is the number of cells.

==== Odds ratio ====

The [[odds ratio]] (OR) is another useful effect size.  It is appropriate when the research question focuses on the degree of association between two [[binary data|binary variables]].  For example, consider a study of spelling ability.  In a control group, two students pass the class for every one who fails, so the odds of passing are two to one (or 2/1 = 2).  In the treatment group, six students pass for every one who fails, so the odds of passing are six to one (or 6/1 = 6).  The effect size can be computed by noting that the odds of passing in the treatment group are three times higher than in the control group (because 6 divided by 2 is 3).  Therefore, the odds ratio is 3. Odds ratio statistics are on a different scale than Cohen's ''d'', so this '3' is not comparable to a Cohen's ''d'' of&nbsp;3.

==== Relative risk ====

The [[relative risk]] (RR), also called '''risk ratio''', is simply the risk (probability) of an event relative to some independent variable.  This measure of effect size differs from the odds ratio in that it compares ''probabilities'' instead of ''odds'', but asymptotically approaches the latter for small probabilities. Using the example above, the ''probabilities'' for those in the control group and treatment group passing is 2/3 (or 0.67) and 6/7 (or 0.86), respectively.  The effect size can be computed the same as above, but using the probabilities instead.  Therefore, the relative risk is 1.28.  Since rather large probabilities of passing were used, there is a large difference between relative risk and odds ratio. Had ''failure'' (a smaller probability) been used as the event (rather than ''passing''), the difference between the two measures of effect size would not be so great.

While both measures are useful, they have different statistical uses.  In medical research, the [[odds ratio]] is commonly used for [[case-control study|case-control studies]], as odds, but not probabilities, are usually estimated.<ref>{{cite journal |author = Deeks J |year = 1998 |title = When can odds ratios mislead? : Odds ratios should be used only in case-control studies and logistic regression analyses |journal = BMJ |volume = 317 |issue = 7166 |pages = 1155–6 |pmid = 9784470 |pmc = 1114127|doi=10.1136/bmj.317.7166.1155a }}</ref>  Relative risk is commonly used in [[randomized controlled trial]]s and [[cohort study|cohort studies]], but relative risk contributes to overestimations of the effectiveness of interventions.<ref name="Stegenga2015">{{Cite journal | last1 = Stegenga | first1 = J. | title = Measuring Effectiveness | journal = Studies in History and Philosophy of Biological and Biomedical Sciences | volume = 54 | pages = 62–71 | year = 2015 | url = https://www.academia.edu/16420844 | doi=10.1016/j.shpsc.2015.06.003| pmid = 26199055 }}</ref>

==== Risk difference ====
The [[risk difference]] (RD), sometimes called absolute risk reduction, is simply the difference in risk (probability) of an event between two groups. It is a useful measure in experimental research, since RD tells you the extent to which an experimental interventions changes the probability of an event or outcome. Using the example above, the probabilities for those in the control group and treatment group passing is 2/3 (or 0.67) and 6/7 (or 0.86), respectively, and so the RD effect size is 0.86&nbsp;−&nbsp;0.67 = 0.19 (or 19%). RD is the superior measure for assessing effectiveness of interventions.<ref name="Stegenga2015"/>

==== Cohen's ''h'' ====
{{main|Cohen's h}}

One measure used in power analysis when comparing two independent proportions is Cohen's&nbsp;''h''. This is defined as follows
<math display="block"> h = 2 ( \arcsin \sqrt{p_1} - \arcsin \sqrt{p_2}) </math>
where ''p''<sub>1</sub> and ''p''<sub>2</sub> are the proportions of the two samples being compared and arcsin is the arcsine transformation.

==== Probability of superiority ====
{{Main|Probability of superiority}}
To more easily describe the meaning of an effect size to people outside statistics, the common language effect size, as the name implies, was designed to communicate it in plain English. It is used to describe a difference between two groups and was proposed, as well as named, by Kenneth McGraw and S. P. Wong in 1992.<ref name="McGraw KO, Wong SP 1992 361–365">{{Cite journal |vauthors=McGraw KO, Wong SP | year = 1992 | title = A common language effect size statistic | journal = [[Psychological Bulletin]] | volume = 111 | issue = 2 | pages = 361–365 | doi= 10.1037/0033-2909.111.2.361}}</ref> They used the following example (about heights of men and women): "in any random pairing of young adult males and females, the probability of the male being taller than the female is .92, or in simpler terms yet, in 92 out of 100 blind dates among young adults, the male will be taller than the female",<ref name="McGraw KO, Wong SP 1992 361–365"/> when describing the population value of the common language effect size.

==== Effect size for ordinal data ====
'''Cliff's delta''' or <math>d</math>, originally developed by [[Norman Cliff]] for use with ordinal data,<ref name="Cliff1993">{{cite journal | last=Cliff | first=Norman | title=Dominance statistics: Ordinal analyses to answer ordinal questions | year=1993 | journal=Psychological Bulletin | volume=114 | pages=494–509 | issue=3 | doi=10.1037/0033-2909.114.3.494}}</ref>{{Dubious|date=May 2024|reason=I'm at least 80% sure this is just a weird name for Kendall's tau.}} is a measure of how often the values in one distribution are larger than the values in a second distribution.  Crucially, it does not require any assumptions about the shape or spread of the two distributions.

The sample estimate <math>d</math> is given by:
<math display="block">d = \frac{\sum_{i,j} [x_i > x_j] - [x_i < x_j]}{mn}</math>
where the two distributions are of size <math>n</math> and <math>m</math> with items <math>x_i</math> and <math>x_j</math>, respectively, and <math>[\cdot]</math> is the [[Iverson bracket]], which is 1 when the contents are true and 0 when false.

<math>d</math> is linearly related to the [[Mann–Whitney U test|Mann–Whitney U statistic]]; however, it captures the direction of the difference in its sign.  Given the Mann–Whitney <math>U</math>, <math>d</math> is:
<math display="block">d = \frac{2U}{mn} - 1</math>

==== Cohen's g ====
One of simplest effect sizes for measuring how much a proportion differs from 50% is Cohen's g.<ref name="CohenJ1988Statistical" />{{Rp|page=147}} It measures how much a proportion differs from 50%. For example, if 85.2% of arrests for car theft are males, then effect size of sex on arrest when measured with Cohen's g is <math>g = 0.852-0.5=0.352</math>. In general:

<math>g = P - 0.50 \text{ or } 0.50 - P \quad (\text{directional}),
</math>

<math>g = |P - 0.50| \quad (\text{nondirectional}).
</math>

Units of Cohen's g are more intuitive (proportion) than in some other effect sizes. It is sometime used in combination with [[Binomial test]].