Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Mann–Whitney U test
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==U statistic == Let <math>X_1,\ldots, X_{n_1}</math> be group 1, an [[Independent and identically distributed random variables|i.i.d. sample]] from <math>X</math>, and <math>Y_1,\ldots, Y_{n_2}</math> be group 2, an i.i.d. sample from <math>Y</math>, and let both samples be independent of each other. The corresponding ''Mann–Whitney [[U statistic]]'' is defined as the smaller of: :<math>U_1 = n_1 n_2 + \tfrac{n_1(n_1 + 1)}{2} - R_1, U_2 = n_1 n_2 + \tfrac{n_2(n_2 + 1)}{2} - R_2</math> with :<math>R_1, R_2 </math> being the sums of the ranks in groups 1 and 2, after ranking all samples from both groups such that the smallest value obtains rank 1 and the largest rank <math>n_1+n_2</math>. <ref>[https://sphweb.bumc.bu.edu/otlt/mph-modules/bs/bs704_nonparametric/bs704_nonparametric4.html Boston University (SPH), 2017]</ref> === Area-under-curve (AUC) statistic for ROC curves === The ''U'' statistic is related to the '''area under the [[receiver operating characteristic]] curve''' ([[Receiver operating characteristic#Area under the curve|AUC]]):<ref>{{cite journal | vauthors=((Mason, S. J.)), ((Graham, N. E.)) | journal=Quarterly Journal of the Royal Meteorological Society | title=Areas beneath the relative operating characteristics (ROC) and relative operating levels (ROL) curves: Statistical significance and interpretation | volume=128 | issue=584 | pages=2145–2166 | date= 2002 | issn=1477-870X | doi=10.1256/003590002320603584}}</ref> :<math>\mathrm{AUC}_1 = {U_1 \over n_1n_2}</math> Note that this is the same definition as the [[common language effect size]], i.e. the probability that a classifier will rank a randomly chosen instance from the first group higher than a randomly chosen instance from the second group.<ref name="fawcett">Fawcett, Tom (2006); ''[https://www.math.ucdavis.edu/~saito/data/roc/fawcett-roc.pdf An introduction to ROC analysis]'', Pattern Recognition Letters, 27, 861–874.</ref> Because of its probabilistic form, the ''U'' statistic can be generalized to a measure of a classifier's separation power for more than two classes:<ref>{{cite journal |last1=Hand |first1=David J. |last2=Till |first2=Robert J. |year=2001 |title=A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems |journal=Machine Learning |volume=45 |pages=171–186 |doi=10.1023/A:1010920819831 |doi-access=free |number=2}}</ref> :<math>M = {1 \over c(c-1)} \sum \mathrm{AUC}_{k,\ell}</math> Where ''c'' is the number of classes, and the ''R''<sub>''k'',''ℓ''</sub> term of AUC<sub>''k'',''ℓ''</sub> considers only the ranking of the items belonging to classes ''k'' and ''ℓ'' (i.e., items belonging to all other classes are ignored) according to the classifier's estimates of the probability of those items belonging to class ''k''. AUC<sub>''k'',''k''</sub> will always be zero but, unlike in the two-class case, generally {{math|1=AUC<sub>''k'',''ℓ''</sub> ≠ AUC<sub>''ℓ'',''k''</sub>}}, which is why the ''M'' measure sums over all (''k'',''ℓ'') pairs, in effect using the average of AUC<sub>''k'',''ℓ''</sub> and AUC<sub>''ℓ'',''k''</sub>.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)