Editing Receiver operating characteristic (section)

==Further interpretations==

Sometimes, the ROC is used to generate a summary statistic. Common versions are:
* the intercept of the ROC curve with the line at 45 degrees orthogonal to the no-discrimination line - the balance point where [[Sensitivity and specificity|Sensitivity]] = [[Specificity (statistics)|Specificity]] 
* the intercept of the ROC curve with the tangent at 45 degrees parallel to the no-discrimination line that is closest to the error-free point (0,1) – also called [[Youden's J statistic]] and generalized as Informedness{{citation needed|date=November 2019}}
* the area between the ROC curve and the no-discrimination line multiplied by two and subtraction of one is called the ''Gini coefficient'', especially in the context of [[credit scoring]].<ref>{{cite journal | vauthors=((Řezáč, M.)), ((Řezáč, F.)) | journal=Czech Journal of Economics and Finance (Finance a úvěr) | title=How to Measure the Quality of Credit Scoring Models | volume=61 | issue=5 | pages=486–507 | publisher=Charles University Prague, Faculty of Social Sciences | date= 2011}}
</ref>  It should not be confused with the [[Gini coefficient|measure of statistical dispersion also called Gini coefficient]].
* the area between the full ROC curve and the triangular ROC curve including only (0,0), (1,1) and one selected operating point <math>(tpr,fpr)</math> – Consistency<ref name="Powers2012a">{{cite conference |title=ROC-ConCert: ROC-Based Measurement of Consistency and Certainty|first=David MW |last=Powers|year=2012|book-title=Spring Congress on Engineering and Technology (SCET)|volume=2|pages=238–241|publisher=IEEE|url=http://www.academia.edu/download/31939951/201203-SCET30795-ROC-ConCert-PID1124774.pdf}}{{dead link|date=July 2022|bot=medic}}{{cbignore|bot=medic}}</ref>
* the area under the ROC curve, or "AUC" ("area under curve"), or A' (pronounced "a-prime"),<ref>{{cite conference |first1=James |last1=Fogarty |first2=Ryan S. |last2=Baker |first3=Scott E. |last3=Hudson |year=2005 |title=Case studies in the use of ROC curve analysis for sensor-based estimates in human computer interaction |book-title=ACM International Conference Proceeding Series, Proceedings of Graphics Interface 2005 |publisher=Canadian Human-Computer Communications Society |location=Waterloo, ON |url=http://portal.acm.org/citation.cfm?id=1089530 }}</ref> or "c-statistic" ("concordance statistic").<ref>{{cite book |first1=Trevor |last1=Hastie|author-link1=Trevor Hastie|first2=Robert |last2=Tibshirani |author-link2=Robert Tibshirani|first3=Jerome H. |last3=Friedman |year=2009 |title=The elements of statistical learning: data mining, inference, and prediction |edition=2nd}}</ref>
* the [[sensitivity index|sensitivity index ''d′'']] (pronounced "d-prime"), the distance between the mean of the distribution of activity in the system under noise-alone conditions and its distribution under signal-alone conditions, divided by their [[standard deviation]], under the assumption that both these distributions are [[normal distribution|normal]] with the same standard deviation. Under these assumptions, the shape of the ROC is entirely determined by ''d′''.

However, any attempt to summarize the ROC curve into a single number loses information about the pattern of tradeoffs of the particular discriminator algorithm.

===Probabilistic interpretation===
The area under the curve (often referred to as simply the AUC) is equal to the probability that a classifier will rank a randomly chosen positive instance higher than a randomly chosen negative one (assuming 'positive' ranks higher than 'negative').<ref name="fawcett">Fawcett, Tom (2006); ''[https://www.math.ucdavis.edu/~saito/data/roc/fawcett-roc.pdf An introduction to ROC analysis]'', Pattern Recognition Letters, 27, 861–874.</ref> In other words, when given one randomly selected positive instance and one randomly selected negative instance, AUC is the probability that the classifier will be able to tell which one is which.

This can be seen as follows: the area under the curve is given by (the integral boundaries are reversed as large threshold <math> T </math> has a lower value on the ''x''-axis)
:<math>\operatorname{TPR}(T): T \to y(x)</math>
:<math>\operatorname{FPR}(T): T \to x</math>
:<math>
\begin{align}
A & = \int_{x=0}^1 \mbox{TPR}(\mbox{FPR}^{-1}(x)) \, dx \\[5pt]
& = \int_{\infty}^{-\infty} \mbox{TPR}(T) \mbox{FPR}'(T) \, dT \\[5pt]
& = \int_{-\infty}^\infty \int_{-\infty}^\infty I(T' \ge T)f_1(T') f_0(T) \, dT' \, dT = P(X_1 \ge X_0)
\end{align}
</math>
where <math> X_1 </math> is the score for a positive instance and <math> X_0 </math> is the score for a negative instance, and <math>f_0</math> and <math>f_1</math> are probability densities as defined in previous section.

If <math> X_0 </math> and <math> X_1 </math> follows two Gaussian distributions, then <math> A = \Phi\left((\mu_1-\mu_0)/\sqrt{\sigma_1^2 + \sigma_0^2}\right) </math>.


===Area under the curve===
It can be shown that the AUC is closely related to the [[Mann–Whitney U]],<ref name="Hanley">{{cite journal |last1=Hanley |first1=James A. |last2=McNeil |first2=Barbara J. |s2cid=10511727 |author-link2=Barbara Joyce McNeil |journal=Radiology |number=1 |pages=29–36 |title=The Meaning and Use of the Area under a Receiver Operating Characteristic (ROC) Curve |volume=143 |year=1982 |pmid=7063747 |doi=10.1148/radiology.143.1.7063747}}</ref><ref name="Mason">{{cite journal |last1=Mason |first1=Simon J. |last2=Graham |first2=Nicholas E. |year=2002 |title=Areas beneath the relative operating characteristics (ROC) and relative operating levels (ROL) curves: Statistical significance and interpretation |journal=Quarterly Journal of the Royal Meteorological Society |volume=128 |issue=584 |pages=2145–2166 |url=http://www.inmet.gov.br/documentos/cursoI_INMET_IRI/Climate_Information_Course/References/Mason+Graham_2002.pdf |archive-url=https://web.archive.org/web/20081120134338/http://www.inmet.gov.br/documentos/cursoI_INMET_IRI/Climate_Information_Course/References/Mason%2BGraham_2002.pdf |url-status=dead |archive-date=2008-11-20 |doi=10.1256/003590002320603584 |citeseerx=10.1.1.458.8392 |bibcode=2002QJRMS.128.2145M |s2cid=121841664 }}</ref> which tests whether positives are ranked higher than negatives. For a predictor <math display="inline">f</math>, an unbiased estimator of its AUC can be expressed by the following ''Wilcoxon-Mann-Whitney'' statistic:<ref>{{Cite book|last1=Calders|first1=Toon|last2=Jaroszewicz|first2=Szymon|date=2007|editor-last=Kok|editor-first=Joost N.|editor2-last=Koronacki|editor2-first=Jacek|editor3-last=Lopez de Mantaras|editor3-first=Ramon|editor4-last=Matwin|editor4-first=Stan|editor5-last=Mladenič|editor5-first=Dunja|editor6-last=Skowron|editor6-first=Andrzej|chapter=Efficient AUC Optimization for Classification|title=Knowledge Discovery in Databases: PKDD 2007|series=Lecture Notes in Computer Science|volume=4702|language=en|location=Berlin, Heidelberg|publisher=Springer|pages=42–53|doi=10.1007/978-3-540-74976-9_8|isbn=978-3-540-74976-9|doi-access=free}}</ref>

: <math>\text{AUC}(f) = 
  \frac{\sum _{t_0 \in \mathcal{D}^0} \sum _{t_1 \in \mathcal{D}^1} 
  \textbf{1}[f(t_0) < f(t_1)]}{|\mathcal{D}^0| \cdot |\mathcal{D}^1|},
</math>

where <math display="inline">\textbf{1}[f(t_0) < f(t_1)]</math> denotes an ''indicator function'' which returns 1 if <math>f(t_0) < f(t_1)</math> otherwise return 0; <math>\mathcal{D}^0</math> is the set of negative examples, and <math>\mathcal{D}^1</math> is the set of positive examples.

In the context of [[credit scoring]], a rescaled version of AUC is often used:

<math>G_1 = 2 \operatorname{AUC} - 1</math>.

<math>G_1</math> is referred to as Gini index or Gini coefficient,<ref>Hand, David J.; and Till, Robert J. (2001); ''A simple generalization of the area under the ROC curve for multiple class classification problems'', Machine Learning, 45, 171–186.</ref> but it should not be confused with the [[Gini coefficient|measure of statistical dispersion that is also called Gini coefficient]]. <math>G_1</math> is a special case of [[Somers' D]].

It is also common to calculate the Area Under the ROC Convex Hull (ROC AUCH = ROCH AUC) as any point on the line segment between two prediction results can be achieved by randomly using one or the other system with probabilities proportional to the relative length of the opposite component of the segment.<ref>{{cite journal |first1=F. |last1=Provost |first2=T. |last2=Fawcett |title=Robust classification for imprecise environments. |journal=Machine Learning |volume=42 |issue=3 |pages=203–231 |year=2001 |doi=10.1023/a:1007601015854 |arxiv=cs/0009007 |s2cid=5415722 }}</ref> It is also possible to invert concavities – just as in the figure the worse solution can be reflected to become a better solution; concavities can be reflected in any line segment, but this more extreme form of fusion is much more likely to overfit the data.<ref name="FlachWu2005">{{cite conference |first1=P.A. |last1=Flach |first2=S. |last2=Wu |year=2005 |title= Repairing concavities in ROC curves. |book-title= 19th International Joint Conference on Artificial Intelligence (IJCAI'05) |pages= 702–707 |url=http://www.icml-2011.org/papers/385_icmlpaper.pdf }}</ref>

The [[machine learning]] community most often uses the ROC AUC statistic for model comparison.<ref>{{cite journal |issue=3 |pages=839–843 |last1=Hanley |first1=James A.| last2=McNeil |first2=Barbara J. |title=A method of comparing the areas under receiver operating characteristic curves derived from the same cases |journal=Radiology |date=1983-09-01 |volume=148 |pmid=6878708 |doi=10.1148/radiology.148.3.6878708|doi-access=free }}</ref> This practice has been questioned because AUC estimates are quite noisy and suffer from other problems.<ref name="Hanczar2010">{{cite journal | last1 = Hanczar | first1 = Blaise | last2 = Hua | first2 = Jianping | last3 = Sima | first3 = Chao | last4 = Weinstein | first4 = John | last5 = Bittner | first5 = Michael | last6 = Dougherty | first6 = Edward R | year = 2010 | title = Small-sample precision of ROC-related estimates | journal = Bioinformatics | volume = 26 | issue = 6| pages = 822–830 | doi=10.1093/bioinformatics/btq037| pmid = 20130029 | doi-access = free }}</ref><ref name="Lobo2008">{{cite journal | last1 = Lobo | first1 = Jorge M. | last2 = Jiménez-Valverde | first2 = Alberto | last3 = Real | first3 = Raimundo | s2cid = 15206363 | year = 2008 | title = AUC: a misleading measure of the performance of predictive distribution models | journal = Global Ecology and Biogeography | volume = 17 | issue = 2| pages = 145–151 | doi=10.1111/j.1466-8238.2007.00358.x| bibcode = 2008GloEB..17..145L }}</ref><ref name="Hand2009">{{cite journal | last1 = Hand | first1 = David J | year = 2009 | title = Measuring classifier performance: A coherent alternative to the area under the ROC curve | journal = Machine Learning | volume = 77 | pages = 103–123 | doi=10.1007/s10994-009-5119-5| doi-access = free | hdl = 10044/1/18420 | hdl-access = free }}</ref> Nonetheless, the coherence of AUC as a measure of aggregated classification performance has been vindicated, in terms of a uniform rate distribution,<ref name="Flachetal2011">{{cite conference |first1=P.A. |last1=Flach |first2=J. |last2=Hernandez-Orallo | first3=C. | last3=Ferri |year=2011 |title=A coherent interpretation of AUC as a measure of aggregated classification performance. |book-title=Proceedings of the 28th International Conference on Machine Learning (ICML-11) |pages=657–664|url=http://www.icml-2011.org/papers/385_icmlpaper.pdf}}</ref> and AUC has been linked to a number of other performance metrics such as the [[Brier score]].<ref name="hernandez2012unified ">{{cite journal |first1=J. |last1= Hernandez-Orallo| first2=P.A.| last2=Flach | first3=C. | last3=Ferri |year=2012 |title=A unified view of performance metrics: translating threshold choice into expected classification loss|journal=Journal of Machine Learning Research|volume=13 |pages=2813–2869 |url=http://jmlr.org/papers/volume13/hernandez-orallo12a/hernandez-orallo12a.pdf}}</ref>

Another problem with ROC AUC is that reducing the ROC Curve to a single number ignores the fact that it is about the tradeoffs between the different systems or performance points plotted and not the performance of an individual system, as well as ignoring the possibility of concavity repair, so that related alternative measures such as Informedness{{citation needed|date=November 2019}} or DeltaP are recommended.<ref name="Powers2012a"/><ref name="Powers2012b">{{cite conference |first=David M.W. |last=Powers |title=The Problem of Area Under the Curve |book-title=International Conference on Information Science and Technology |year=2012}}</ref> These measures are essentially equivalent to the Gini for a single prediction point with DeltaP' = Informedness = 2AUC-1, whilst DeltaP = Markedness represents the dual (viz. predicting the prediction from the real class) and their geometric mean is the [[Matthews correlation coefficient]].{{citation needed|date=November 2019}}

Whereas ROC AUC varies between 0 and 1 — with an uninformative classifier yielding 0.5 — the alternative measures known as [[Informedness]],{{citation needed|date=November 2019}} Certainty <ref name="Powers2012a"/> and Gini Coefficient (in the single parameterization or single system case){{citation needed|date=November 2019}} all have the advantage that 0 represents chance performance whilst 1 represents perfect performance, and −1 represents the "perverse" case of full informedness always giving the wrong response.<ref>{{cite conference|first=David M. W. |last=Powers |year=2003 |title= Recall and Precision versus the Bookmaker |book-title=Proceedings of the International Conference on Cognitive Science (ICSC-2003), Sydney Australia, 2003, pp.&nbsp;529–534. | url=http://dl.dropbox.com/u/27743223/200302-ICCS-Bookmaker.pdf }}</ref> Bringing chance performance to 0 allows these alternative scales to be interpreted as Kappa statistics. Informedness has been shown to have desirable characteristics for Machine Learning versus other common definitions of Kappa such as [[Cohen's kappa|Cohen Kappa]] and [[Fleiss' kappa|Fleiss Kappa]].{{citation needed|date=November 2019}}<ref>{{cite conference |first=David M. W. |last=Powers |year=2012 |url=http://dl.dropbox.com/u/27743223/201209-eacl2012-Kappa.pdf |title=The Problem with Kappa |book-title=Conference of the European Chapter of the Association for Computational Linguistics (EACL2012) Joint ROBUS-UNSUP Workshop |access-date=2012-07-20 |archive-url=http://arquivo.pt/wayback/20160518183306/http://dl.dropbox.com/u/27743223/201209-eacl2012-Kappa.pdf |archive-date=2016-05-18 |url-status=dead }}</ref>

Sometimes it can be more useful to look at a specific region of the ROC Curve rather than at the whole curve. It is possible to compute [[Partial Area Under the ROC Curve (pAUC)|partial AUC]].<ref>{{cite journal |doi=10.1177/0272989X8900900307 |volume=9 |issue=3 |pages=190–195 |last=McClish |first=Donna Katzman |s2cid=24442201 |title=Analyzing a Portion of the ROC Curve |journal=Medical Decision Making |date=1989-08-01 |pmid=2668680 }}</ref> For example, one could focus on the region of the curve with low false positive rate, which is often of prime interest for population screening tests.<ref>{{cite journal |doi=10.1111/1541-0420.00071 |volume=59 |issue=3 |pages=614–623 |last1=Dodd |first1=Lori E. | first2=Margaret S. |last2=Pepe |title=Partial AUC Estimation and Regression |journal=Biometrics |year=2003 |pmid=14601762 |s2cid=23054670 |url=http://biostats.bepress.com/cgi/viewcontent.cgi?article=1005&context=uwbiostat |doi-access=free }}</ref> Another common approach for classification problems in which P ≪ N (common in bioinformatics applications) is to use a logarithmic scale for the ''x''-axis.<ref>Karplus, Kevin (2011); [http://www.soe.ucsc.edu/~karplus/papers/better-than-chance-sep-07.pdf ''Better than Chance: the importance of null models''], University of California, Santa Cruz, in Proceedings of the First International Workshop on Pattern Recognition in Proteomics, Structural Biology and Bioinformatics (PR PS BB 2011)</ref>

The ROC area under the curve is also called '''c-statistic''' or '''c statistic'''.<ref>{{Cite web|url=https://www.statisticshowto.datasciencecentral.com/c-statistic/|title=C-Statistic: Definition, Examples, Weighting and Significance|date=August 28, 2016|website=Statistics How To}}</ref>

===Other measures===
[[File:TOC.png|thumb|right|TOC Curve]]
The [[Total Operating Characteristic]] (TOC) also characterizes diagnostic ability while revealing more information than the ROC. For each threshold, ROC reveals two ratios, TP/(TP + FN) and FP/(FP + TN). In other words, ROC reveals <math>\frac{\text{hits}}{\text{hits}+\text{misses}}</math> and <math>\frac{\text{false alarms}}{\text{false alarms} + \text{correct rejections}}</math>. On the other hand, TOC shows the total information in the contingency table for each threshold.<ref>{{cite journal|last1=Pontius|first1=Robert Gilmore|last2=Parmentier|first2=Benoit|s2cid=15924380|title=Recommendations for using the Relative Operating Characteristic (ROC)|journal=Landscape Ecology|date=2014|volume=29|issue=3|pages=367–382|doi=10.1007/s10980-013-9984-8|bibcode=2014LaEco..29..367P }}</ref> The TOC method reveals all of the information that the ROC method provides, plus additional important information that ROC does not reveal, i.e. the size of every entry in the contingency table for each threshold. TOC also provides the popular AUC of the ROC.<ref>{{cite journal|last1=Pontius|first1=Robert Gilmore|last2=Si|first2=Kangping|s2cid=29204880|title=The total operating characteristic to measure diagnostic ability for multiple thresholds|journal=International Journal of Geographical Information Science|date=2014|volume=28|issue=3|pages=570–583|doi=10.1080/13658816.2013.862623|bibcode=2014IJGIS..28..570P }}</ref>
[[File:ROC.png|thumb|right|ROC Curve]]
These figures are the TOC and ROC curves using the same data and thresholds.
Consider the point that corresponds to a threshold of 74. The TOC curve shows the number of hits, which is 3, and hence the number of misses, which is 7. Additionally, the TOC curve shows that the number of false alarms is 4 and the number of correct rejections is 16. At any given point in the ROC curve, it is possible to glean values for the ratios of <math>\frac{\text{false alarms}}{\text{false alarms} + \text{correct rejections}}</math> and <math>\frac{\text{hits}}{\text{hits}+\text{misses}}</math>. For example, at threshold 74, it is evident that the x coordinate is 0.2 and the y coordinate is 0.3. However, these two values are insufficient to construct all entries of the underlying two-by-two contingency table.
{{Clear}}