Editing Decision tree learning (section)

=== Estimate of Positive Correctness ===
A simple and effective metric can be used to identify the degree to which true positives outweigh false positives (see [[Confusion matrix]]). This metric, "Estimate of Positive Correctness" is defined below:

<math>    E_P = TP - FP </math>

In this equation, the total false positives (FP) are subtracted from the total true positives (TP). The resulting number gives an estimate on how many positive examples the feature could correctly identify within the data, with higher numbers meaning that the feature could correctly classify more positive samples. Below is an example of how to use the metric when the full confusion matrix of a certain feature is given:

'''Feature A Confusion Matrix'''
{| class="wikitable"
! {{diagonal split header|Actual Class|Predicted<br />Class}}
!Cancer
!Non-cancer
|-
!Cancer
|<u>8</u>
|3
|-
!Non-cancer
|<u>2</u>
|5
|}
Here we can see that the TP value would be 8 and the FP value would be 2 (the underlined numbers in the table). When we plug these numbers in the equation we are able to calculate the estimate: <math>E_p = TP - FP = 8 - 2 = 6</math>. This means that using the estimate on this feature would have it receive a score of 6.

However, it should be worth noting that this number is only an estimate. For example, if two features both had a FP value of 2 while one of the features had a higher TP value, that feature would be ranked higher than the other because the resulting estimate when using the equation would give a higher value. This could lead to some inaccuracies when using the metric if some features have more positive samples than others. To combat this, one could use a more powerful metric known as [[Sensitivity and specificity|Sensitivity]] that takes into account the proportions of the values from the confusion matrix to give the actual [[Sensitivity and specificity|true positive rate]] (TPR). The difference between these metrics is shown in the example below:
{|
|+
|'''Feature A Confusion Matrix'''                                           
{| class="wikitable"
! {{diagonal split header|Actual Class|Predicted<br />Class}}
!Cancer
!Non-cancer
|-
!Cancer
|8
|3
|-
!Non-cancer
|2
|5
|}
|style="padding-left: 4em;" | '''Feature B Confusion Matrix'''                                               
{| class="wikitable"
! {{diagonal split header|Actual Class|Predicted<br />Class}}
!Cancer
!Non-cancer
|-
!Cancer
|6
|2
|-
!Non-cancer
|2
|8
|}
|-
|<math>E_p = TP - FP = 8 - 2 = 6</math>
<math>TPR = TP / (TP + FN) = 8 / (8 + 3) \approx 0.73 </math>
|style="padding-left: 4em;" | <math>E_p = TP - FP = 6 - 2 = 4</math>
<math>TPR = TP / (TP + FN) = 6 / (6 + 2) = 0.75 </math>
|}
In this example, Feature A had an estimate of 6 and a TPR of approximately 0.73 while Feature B had an estimate of 4 and a TPR of 0.75. This shows that although the positive estimate for some feature may be higher, the more accurate TPR value for that feature may be lower when compared to other features that have a lower positive estimate. Depending on the situation and knowledge of the data and decision trees, one may opt to use the positive estimate for a quick and easy solution to their problem. On the other hand, a more experienced user would most likely prefer to use the TPR value to rank the features because it takes into account the proportions of the data and all the samples that should have been classified as positive.