Editing Binary classification (section)

==Evaluation==
{{main|Evaluation of binary classifiers}}

From tallies of the four basic outcomes, there are many approaches that can be used to measure the accuracy of a classifier or predictor. Different fields have different preferences. 

===The eight basic ratios===
A common approach to evaluation is to begin by computing two ratios of a standard pattern. There are eight basic ratios of this form that one can compute from the contingency table, which come in four complementary pairs (each pair summing to 1). These are obtained by dividing each of the four numbers by the sum of its row or column, yielding eight numbers, which can be referred to generically in the form "true positive row ratio" or "false negative column ratio". 

There are thus two pairs of column ratios and two pairs of row ratios, and one can summarize these with four numbers by choosing one ratio from each pair – the other four numbers are the complements.

The row ratios are:
*[[true positive rate]] (TPR) = (TP/(TP+FN)), aka '''[[Sensitivity (tests)|sensitivity]]''' or [[Recall (information retrieval)|recall]].  These are the proportion of the ''population with the condition'' for which the test is correct.
**with complement the [[false negative rate]] (FNR) = (FN/(TP+FN))
*[[true negative rate]] (TNR) = (TN/(TN+FP), aka '''[[Specificity (tests)|specificity]]''' (SPC),
**with complement [[false positive rate]] (FPR) = (FP/(TN+FP)), also called independent of [[prevalence]]

The column ratios are:
*[[Positive Predictive Value|positive predictive value]] (PPV, aka [[Precision (information retrieval)|precision]]) (TP/(TP+FP)).  These are the proportion of the ''population with a given test result'' for which the test is correct.
**with complement the [[false discovery rate]] (FDR) (FP/(TP+FP))
*[[negative predictive value]] (NPV) (TN/(TN+FN))
**with complement the [[false omission rate]] (FOR) (FN/(TN+FN)), also called dependence on prevalence.

In diagnostic testing, the main ratios used are the true column ratios – true positive rate and true negative rate – where they are known as [[sensitivity and specificity]]. In informational retrieval, the main ratios are the true positive ratios (row and column) – positive predictive value and true positive rate – where they are known as [[precision and recall]]. 

Cullerne Bown has suggested a flow chart for determining which pair of indicators should be used when.<ref name="CullerneBown2024">
{{Cite journal
 | author = William Cullerne Bown
 | title = Sensitivity and Specificity versus Precision and Recall, and Related Dilemmas
 | journal = [[Journal of Classification]]
 | year = 2024
 | volume = 41
 | issue = 2
 | pages = 402–426
 | doi = 10.1007/s00357-024-09478-y
 | url = https://rdcu.be/dL1wK
| url-access = subscription
 }} </ref> Otherwise, there is no general rule for deciding. There is also no general agreement on how the pair of indicators should be used to decide on concrete questions, such as when to prefer one classifier over another.

One can take ratios of a complementary pair of ratios, yielding four [[Likelihood ratios in diagnostic testing|likelihood ratios]] (two column ratio of ratios, two row ratio of ratios). This is primarily done for the column (condition) ratios, yielding [[likelihood ratios in diagnostic testing]]. Taking the ratio of one of these groups of ratios yields a final ratio, the [[diagnostic odds ratio]] (DOR). This can also be defined directly as (TP×TN)/(FP×FN) = (TP/FN)/(FP/TN); this has a useful interpretation – as an [[odds ratio]] – and is prevalence-independent.

===Other metrics===

There are a number of other metrics, most simply the [[Accuracy and precision#In binary classification|accuracy]] or Fraction Correct (FC), which measures the fraction of all instances that are correctly categorized; the complement is the Fraction Incorrect (FiC). The [[F-score]] combines precision and recall into one number via a choice of weighing, most simply equal weighing, as the balanced F-score ([[F1 score]]). Some metrics come from [[regression coefficient]]s: the [[markedness]] and the [[informedness]], and their [[geometric mean]], the [[Matthews correlation coefficient]]. Other metrics include [[Youden's J statistic]], the [[uncertainty coefficient]], the [[phi coefficient]], and [[Cohen's kappa]].