Editing Mann–Whitney U test (section)

==Calculations==
The test involves the calculation of a [[statistic]], usually called ''U'', whose distribution under the [[null hypothesis]] is known:
* In the case of small samples, the distribution is tabulated
* For sample sizes above&nbsp;~20, approximation using the [[normal distribution]] is fairly good. 

Alternatively, the null distribution can be approximated using [[permutation test]]s and Monte Carlo simulations.

Some books tabulate statistics equivalent to ''U'', such as the sum of ranks in one of the samples, rather than ''U'' itself.

The Mann–Whitney ''U'' test is included in most [[List of statistical packages|statistical packages]]. 

It is also easily calculated by hand, especially for small samples. There are two ways of doing this.

'''Method one:'''

For comparing two small sets of observations, a direct method is quick, and gives insight into the meaning of the ''U'' statistic, which corresponds to the number of wins out of all pairwise contests (see the tortoise and hare example under Examples below). For each observation in one set, count the number of times this first value wins over any observations in the other set (the other value loses if this first is larger). Count&nbsp;0.5 for any ties. The sum of wins and ties is ''U'' (i.e.: <math>U_1</math>) for the first set. ''U'' for the other set is the converse (i.e.: <math>U_2</math>).

'''Method two:'''

For larger samples:
# Assign numeric ranks to all the observations (put the observations from both groups to one set), beginning with 1 for the smallest value. Where there are groups of tied values, assign a rank equal to the midpoint of unadjusted rankings (e.g., the ranks of {{math|(3, 5, 5, 5, 5, 8)}} are {{math|(1, 3.5, 3.5, 3.5, 3.5, 6)}}, where the unadjusted ranks would be {{math|(1, 2, 3, 4, 5, 6)}}).
# Now, add up the ranks for the observations which came from sample&nbsp;1. The sum of ranks in sample 2 is now determined, since the sum of all the ranks equals {{math|''N''(''N'' + 1)/2}} where ''N'' is the total number of observations.
# ''U'' is then given by:<ref>{{cite book|last=Zar|first=Jerrold&nbsp;H.|title=Biostatistical Analysis|year=1998|publisher=Prentice Hall International, INC.|location=New Jersey|isbn=978-0-13-082390-8|page=147}}</ref>

:::<math>U_1=R_1 - {n_1(n_1+1) \over 2} \,\!</math>

::where ''n''<sub>1</sub> is the sample size for sample 1, and ''R''<sub>1</sub> is the sum of the ranks in sample&nbsp;1.

::Note that it doesn't matter which of the two samples is considered sample&nbsp;1. An equally valid formula for ''U'' is

:::<math>U_2= R_2 - {n_2(n_2+1) \over 2} \,\!</math>

::The smaller value of ''U''<sub>1</sub> and ''U''<sub>2</sub> is the one used when consulting significance tables. The sum of the two values is given by
:::<math>U_1 + U_2 = R_1 - {n_1(n_1+1) \over 2} + R_2 - {n_2(n_2+1) \over 2}. \,\!</math>

:: Knowing that {{math|1=''R''<sub>1</sub> + ''R''<sub>2</sub> = ''N''(''N'' + 1)/2}} and {{math|1=''N'' = ''n''<sub>1</sub> + ''n''<sub>2</sub>}}, and doing some [[algebra]], we find that the sum is
:::{{math|1=''U''<sub>1</sub> + ''U''<sub>2</sub> = ''n''<sub>1</sub>''n''<sub>2</sub>}}.