Editing Mann–Whitney U test (section)

==Effect sizes==
It is a widely recommended practice for scientists to report an [[effect size]] for an inferential test.<ref name="Wilkinson1999">{{cite journal | last=Wilkinson | first=Leland | title=Statistical methods in psychology journals: Guidelines and explanations | year=1999 | journal=American Psychologist | volume=54 | pages=594–604 | doi=10.1037/0003-066X.54.8.594 | issue=8}}</ref><ref name="Nakagawa2007">{{cite journal | last=Nakagawa | first=Shinichi |author2=Cuthill, Innes C | year=2007 | title=Effect size, confidence interval and statistical significance: a practical guide for biologists | journal = Biological Reviews of the Cambridge Philosophical Society | volume=82 | pages=591–605 | doi=10.1111/j.1469-185X.2007.00027.x | pmid=17944619 | issue=4| s2cid=615371 }}</ref>

===Proportion of concordance out of all pairs===
The following measures are equivalent.

====Common language effect size====
One method of reporting the effect size for the Mann–Whitney ''U'' test is with ''f'', the common language effect size.<ref name="Kerby2014">{{cite journal | last1 = Kerby | first1 = D.S. | year = 2014 | title = The simple difference formula: An approach to teaching nonparametric correlation | journal = Comprehensive Psychology | volume =  3| page =  11.IT.3.1| doi = 10.2466/11.IT.3.1 | s2cid = 120622013 | doi-access = free }}</ref><ref name="McGraw1992">{{cite journal | last1 = McGraw | first1 = K.O. | last2 = Wong | first2 = J.J. | year = 1992 | title = A common language effect size statistic | journal = Psychological Bulletin | volume = 111 | issue = 2| pages = 361–365 | doi = 10.1037/0033-2909.111.2.361 }}</ref> As a sample statistic, the common language effect size is computed by forming all possible pairs between the two groups, then finding the proportion of pairs that support a direction (say, that items from group 1 are larger than items from group 2).<ref name="McGraw1992"/> To illustrate, in a study with a sample of ten hares and ten tortoises, the total number of ordered pairs is ten times ten or 100 pairs of hares and tortoises. Suppose the results show that the hare ran faster than the tortoise in 90 of the 100 sample pairs; in that case, the sample common language effect size is 90%.<ref>{{Cite journal | author = Grissom RJ | year = 1994| title = Statistical analysis of ordinal categorical status after therapies | journal = [[Journal of Consulting and Clinical Psychology]] | volume = 62| issue = 2| pages = 281–284| doi= 10.1037/0022-006X.62.2.281 | pmid = 8201065}}</ref>

The relationship between ''f'' and the Mann–Whitney ''U'' (specifically <math>U_1</math>) is as follows:

:<math> f  = {U_1 \over n_1 n_2} \,</math>

This is the same as the [[#Area-under-curve (AUC) statistic for ROC curves|area under the curve (AUC) for the ROC curve]].

====''ρ'' statistic====
A statistic called ''ρ'' that is linearly related to ''U'' and widely used in studies of categorization ([[discrimination learning]] involving [[concept]]s), and elsewhere,<ref name="H1976" /> is calculated by dividing ''U'' by its maximum value for the given sample sizes, which is simply {{math|1=''n''<sub>1</sub>×''n''<sub>2</sub>}}. ''ρ'' is thus a non-parametric measure of the overlap between two distributions; it can take values between 0 and 1, and it estimates {{math|1=P(''Y'' > ''X'') + 0.5 P(''Y'' = ''X'')}}, where ''X'' and ''Y'' are randomly chosen observations from the two distributions. Both extreme values represent complete separation of the distributions, while a ''ρ'' of 0.5 represents complete overlap. The usefulness of the ''ρ'' statistic can be seen in the case of the odd example used above, where two distributions that were significantly different on a Mann–Whitney ''U'' test nonetheless had nearly identical medians: the ''ρ'' value in this case is approximately 0.723 in favour of the hares, correctly reflecting the fact that even though the median tortoise beat the median hare, the hares collectively did better than the tortoises collectively.{{citation needed|date=February 2012}}

===Rank-biserial correlation===
A method of reporting the effect size for the Mann–Whitney ''U'' test is with a measure of [[rank correlation]] known as the rank-biserial correlation. Edward Cureton introduced and named the measure.<ref>{{cite journal | last1 = Cureton | first1 = E.E. | year = 1956 | title = Rank-biserial correlation | journal = Psychometrika | volume = 21 | issue = 3| pages = 287–290 | doi = 10.1007/BF02289138 | s2cid = 122500836 }}</ref> Like other correlational measures, the rank-biserial correlation can range from minus one to plus one, with a value of zero indicating no relationship.

There is a simple difference formula to compute the rank-biserial correlation from the common language effect size: the correlation is the difference between the proportion of pairs favorable to the hypothesis (''f'') minus its complement (i.e.: the proportion that is unfavorable (''u'')). This simple difference formula is just the difference of the common language effect size of each group, and is as follows:<ref name="Kerby2014"/>

:<math>r = f - u </math>

For example, consider the example where hares run faster than tortoises in 90 of 100 pairs. The common language effect size is 90%, so the rank-biserial correlation is 90% minus 10%, and the rank-biserial&nbsp;{{math|1=''r'' = 0.80}}.

An alternative formula for the rank-biserial can be used to calculate it from the Mann–Whitney ''U'' (either <math>U_1</math> or <math>U_2</math>) and the sample sizes of each group:<ref>{{cite journal | last1 = Wendt | first1 = H.W. | year = 1972 | title = Dealing with a common problem in social science: A simplified rank-biserial coefficient of correlation based on the ''U'' statistic | journal = European Journal of Social Psychology | volume = 2 | issue = 4| pages = 463–465 | doi = 10.1002/ejsp.2420020412 }}</ref>

: <math> r = f - (1 - f) = 2 f - 1 = {2U_1 \over n_1 n_2} - 1 = 1 - {2U_2 \over n_1 n_2} </math>

This formula is useful when the data are not available, but when there is a published report, because ''U'' and the sample sizes are routinely reported. Using the example above with 90 pairs that favor the hares and 10 pairs that favor the tortoise, ''U''<sub>2</sub> is the smaller of the two, so {{math|1=''U<sub>2</sub>'' = 10}}. This formula then gives {{math|1=''r'' = 1 – (2×10) / (10×10) = 0.80}}, which is the same result as with the simple difference formula above.