Editing Receiver operating characteristic (section)

===Area under the curve===
It can be shown that the AUC is closely related to the [[Mann–Whitney U]],<ref name="Hanley">{{cite journal |last1=Hanley |first1=James A. |last2=McNeil |first2=Barbara J. |s2cid=10511727 |author-link2=Barbara Joyce McNeil |journal=Radiology |number=1 |pages=29–36 |title=The Meaning and Use of the Area under a Receiver Operating Characteristic (ROC) Curve |volume=143 |year=1982 |pmid=7063747 |doi=10.1148/radiology.143.1.7063747}}</ref><ref name="Mason">{{cite journal |last1=Mason |first1=Simon J. |last2=Graham |first2=Nicholas E. |year=2002 |title=Areas beneath the relative operating characteristics (ROC) and relative operating levels (ROL) curves: Statistical significance and interpretation |journal=Quarterly Journal of the Royal Meteorological Society |volume=128 |issue=584 |pages=2145–2166 |url=http://www.inmet.gov.br/documentos/cursoI_INMET_IRI/Climate_Information_Course/References/Mason+Graham_2002.pdf |archive-url=https://web.archive.org/web/20081120134338/http://www.inmet.gov.br/documentos/cursoI_INMET_IRI/Climate_Information_Course/References/Mason%2BGraham_2002.pdf |url-status=dead |archive-date=2008-11-20 |doi=10.1256/003590002320603584 |citeseerx=10.1.1.458.8392 |bibcode=2002QJRMS.128.2145M |s2cid=121841664 }}</ref> which tests whether positives are ranked higher than negatives. For a predictor <math display="inline">f</math>, an unbiased estimator of its AUC can be expressed by the following ''Wilcoxon-Mann-Whitney'' statistic:<ref>{{Cite book|last1=Calders|first1=Toon|last2=Jaroszewicz|first2=Szymon|date=2007|editor-last=Kok|editor-first=Joost N.|editor2-last=Koronacki|editor2-first=Jacek|editor3-last=Lopez de Mantaras|editor3-first=Ramon|editor4-last=Matwin|editor4-first=Stan|editor5-last=Mladenič|editor5-first=Dunja|editor6-last=Skowron|editor6-first=Andrzej|chapter=Efficient AUC Optimization for Classification|title=Knowledge Discovery in Databases: PKDD 2007|series=Lecture Notes in Computer Science|volume=4702|language=en|location=Berlin, Heidelberg|publisher=Springer|pages=42–53|doi=10.1007/978-3-540-74976-9_8|isbn=978-3-540-74976-9|doi-access=free}}</ref>

: <math>\text{AUC}(f) = 
  \frac{\sum _{t_0 \in \mathcal{D}^0} \sum _{t_1 \in \mathcal{D}^1} 
  \textbf{1}[f(t_0) < f(t_1)]}{|\mathcal{D}^0| \cdot |\mathcal{D}^1|},
</math>

where <math display="inline">\textbf{1}[f(t_0) < f(t_1)]</math> denotes an ''indicator function'' which returns 1 if <math>f(t_0) < f(t_1)</math> otherwise return 0; <math>\mathcal{D}^0</math> is the set of negative examples, and <math>\mathcal{D}^1</math> is the set of positive examples.

In the context of [[credit scoring]], a rescaled version of AUC is often used:

<math>G_1 = 2 \operatorname{AUC} - 1</math>.

<math>G_1</math> is referred to as Gini index or Gini coefficient,<ref>Hand, David J.; and Till, Robert J. (2001); ''A simple generalization of the area under the ROC curve for multiple class classification problems'', Machine Learning, 45, 171–186.</ref> but it should not be confused with the [[Gini coefficient|measure of statistical dispersion that is also called Gini coefficient]]. <math>G_1</math> is a special case of [[Somers' D]].

It is also common to calculate the Area Under the ROC Convex Hull (ROC AUCH = ROCH AUC) as any point on the line segment between two prediction results can be achieved by randomly using one or the other system with probabilities proportional to the relative length of the opposite component of the segment.<ref>{{cite journal |first1=F. |last1=Provost |first2=T. |last2=Fawcett |title=Robust classification for imprecise environments. |journal=Machine Learning |volume=42 |issue=3 |pages=203–231 |year=2001 |doi=10.1023/a:1007601015854 |arxiv=cs/0009007 |s2cid=5415722 }}</ref> It is also possible to invert concavities – just as in the figure the worse solution can be reflected to become a better solution; concavities can be reflected in any line segment, but this more extreme form of fusion is much more likely to overfit the data.<ref name="FlachWu2005">{{cite conference |first1=P.A. |last1=Flach |first2=S. |last2=Wu |year=2005 |title= Repairing concavities in ROC curves. |book-title= 19th International Joint Conference on Artificial Intelligence (IJCAI'05) |pages= 702–707 |url=http://www.icml-2011.org/papers/385_icmlpaper.pdf }}</ref>

The [[machine learning]] community most often uses the ROC AUC statistic for model comparison.<ref>{{cite journal |issue=3 |pages=839–843 |last1=Hanley |first1=James A.| last2=McNeil |first2=Barbara J. |title=A method of comparing the areas under receiver operating characteristic curves derived from the same cases |journal=Radiology |date=1983-09-01 |volume=148 |pmid=6878708 |doi=10.1148/radiology.148.3.6878708|doi-access=free }}</ref> This practice has been questioned because AUC estimates are quite noisy and suffer from other problems.<ref name="Hanczar2010">{{cite journal | last1 = Hanczar | first1 = Blaise | last2 = Hua | first2 = Jianping | last3 = Sima | first3 = Chao | last4 = Weinstein | first4 = John | last5 = Bittner | first5 = Michael | last6 = Dougherty | first6 = Edward R | year = 2010 | title = Small-sample precision of ROC-related estimates | journal = Bioinformatics | volume = 26 | issue = 6| pages = 822–830 | doi=10.1093/bioinformatics/btq037| pmid = 20130029 | doi-access = free }}</ref><ref name="Lobo2008">{{cite journal | last1 = Lobo | first1 = Jorge M. | last2 = Jiménez-Valverde | first2 = Alberto | last3 = Real | first3 = Raimundo | s2cid = 15206363 | year = 2008 | title = AUC: a misleading measure of the performance of predictive distribution models | journal = Global Ecology and Biogeography | volume = 17 | issue = 2| pages = 145–151 | doi=10.1111/j.1466-8238.2007.00358.x| bibcode = 2008GloEB..17..145L }}</ref><ref name="Hand2009">{{cite journal | last1 = Hand | first1 = David J | year = 2009 | title = Measuring classifier performance: A coherent alternative to the area under the ROC curve | journal = Machine Learning | volume = 77 | pages = 103–123 | doi=10.1007/s10994-009-5119-5| doi-access = free | hdl = 10044/1/18420 | hdl-access = free }}</ref> Nonetheless, the coherence of AUC as a measure of aggregated classification performance has been vindicated, in terms of a uniform rate distribution,<ref name="Flachetal2011">{{cite conference |first1=P.A. |last1=Flach |first2=J. |last2=Hernandez-Orallo | first3=C. | last3=Ferri |year=2011 |title=A coherent interpretation of AUC as a measure of aggregated classification performance. |book-title=Proceedings of the 28th International Conference on Machine Learning (ICML-11) |pages=657–664|url=http://www.icml-2011.org/papers/385_icmlpaper.pdf}}</ref> and AUC has been linked to a number of other performance metrics such as the [[Brier score]].<ref name="hernandez2012unified ">{{cite journal |first1=J. |last1= Hernandez-Orallo| first2=P.A.| last2=Flach | first3=C. | last3=Ferri |year=2012 |title=A unified view of performance metrics: translating threshold choice into expected classification loss|journal=Journal of Machine Learning Research|volume=13 |pages=2813–2869 |url=http://jmlr.org/papers/volume13/hernandez-orallo12a/hernandez-orallo12a.pdf}}</ref>

Another problem with ROC AUC is that reducing the ROC Curve to a single number ignores the fact that it is about the tradeoffs between the different systems or performance points plotted and not the performance of an individual system, as well as ignoring the possibility of concavity repair, so that related alternative measures such as Informedness{{citation needed|date=November 2019}} or DeltaP are recommended.<ref name="Powers2012a"/><ref name="Powers2012b">{{cite conference |first=David M.W. |last=Powers |title=The Problem of Area Under the Curve |book-title=International Conference on Information Science and Technology |year=2012}}</ref> These measures are essentially equivalent to the Gini for a single prediction point with DeltaP' = Informedness = 2AUC-1, whilst DeltaP = Markedness represents the dual (viz. predicting the prediction from the real class) and their geometric mean is the [[Matthews correlation coefficient]].{{citation needed|date=November 2019}}

Whereas ROC AUC varies between 0 and 1 — with an uninformative classifier yielding 0.5 — the alternative measures known as [[Informedness]],{{citation needed|date=November 2019}} Certainty <ref name="Powers2012a"/> and Gini Coefficient (in the single parameterization or single system case){{citation needed|date=November 2019}} all have the advantage that 0 represents chance performance whilst 1 represents perfect performance, and −1 represents the "perverse" case of full informedness always giving the wrong response.<ref>{{cite conference|first=David M. W. |last=Powers |year=2003 |title= Recall and Precision versus the Bookmaker |book-title=Proceedings of the International Conference on Cognitive Science (ICSC-2003), Sydney Australia, 2003, pp.&nbsp;529–534. | url=http://dl.dropbox.com/u/27743223/200302-ICCS-Bookmaker.pdf }}</ref> Bringing chance performance to 0 allows these alternative scales to be interpreted as Kappa statistics. Informedness has been shown to have desirable characteristics for Machine Learning versus other common definitions of Kappa such as [[Cohen's kappa|Cohen Kappa]] and [[Fleiss' kappa|Fleiss Kappa]].{{citation needed|date=November 2019}}<ref>{{cite conference |first=David M. W. |last=Powers |year=2012 |url=http://dl.dropbox.com/u/27743223/201209-eacl2012-Kappa.pdf |title=The Problem with Kappa |book-title=Conference of the European Chapter of the Association for Computational Linguistics (EACL2012) Joint ROBUS-UNSUP Workshop |access-date=2012-07-20 |archive-url=http://arquivo.pt/wayback/20160518183306/http://dl.dropbox.com/u/27743223/201209-eacl2012-Kappa.pdf |archive-date=2016-05-18 |url-status=dead }}</ref>

Sometimes it can be more useful to look at a specific region of the ROC Curve rather than at the whole curve. It is possible to compute [[Partial Area Under the ROC Curve (pAUC)|partial AUC]].<ref>{{cite journal |doi=10.1177/0272989X8900900307 |volume=9 |issue=3 |pages=190–195 |last=McClish |first=Donna Katzman |s2cid=24442201 |title=Analyzing a Portion of the ROC Curve |journal=Medical Decision Making |date=1989-08-01 |pmid=2668680 }}</ref> For example, one could focus on the region of the curve with low false positive rate, which is often of prime interest for population screening tests.<ref>{{cite journal |doi=10.1111/1541-0420.00071 |volume=59 |issue=3 |pages=614–623 |last1=Dodd |first1=Lori E. | first2=Margaret S. |last2=Pepe |title=Partial AUC Estimation and Regression |journal=Biometrics |year=2003 |pmid=14601762 |s2cid=23054670 |url=http://biostats.bepress.com/cgi/viewcontent.cgi?article=1005&context=uwbiostat |doi-access=free }}</ref> Another common approach for classification problems in which P ≪ N (common in bioinformatics applications) is to use a logarithmic scale for the ''x''-axis.<ref>Karplus, Kevin (2011); [http://www.soe.ucsc.edu/~karplus/papers/better-than-chance-sep-07.pdf ''Better than Chance: the importance of null models''], University of California, Santa Cruz, in Proceedings of the First International Workshop on Pattern Recognition in Proteomics, Structural Biology and Bioinformatics (PR PS BB 2011)</ref>

The ROC area under the curve is also called '''c-statistic''' or '''c statistic'''.<ref>{{Cite web|url=https://www.statisticshowto.datasciencecentral.com/c-statistic/|title=C-Statistic: Definition, Examples, Weighting and Significance|date=August 28, 2016|website=Statistics How To}}</ref>