Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Decision tree learning
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
=== Estimate of Positive Correctness === A simple and effective metric can be used to identify the degree to which true positives outweigh false positives (see [[Confusion matrix]]). This metric, "Estimate of Positive Correctness" is defined below: <math> E_P = TP - FP </math> In this equation, the total false positives (FP) are subtracted from the total true positives (TP). The resulting number gives an estimate on how many positive examples the feature could correctly identify within the data, with higher numbers meaning that the feature could correctly classify more positive samples. Below is an example of how to use the metric when the full confusion matrix of a certain feature is given: '''Feature A Confusion Matrix''' {| class="wikitable" ! {{diagonal split header|Actual Class|Predicted<br />Class}} !Cancer !Non-cancer |- !Cancer |<u>8</u> |3 |- !Non-cancer |<u>2</u> |5 |} Here we can see that the TP value would be 8 and the FP value would be 2 (the underlined numbers in the table). When we plug these numbers in the equation we are able to calculate the estimate: <math>E_p = TP - FP = 8 - 2 = 6</math>. This means that using the estimate on this feature would have it receive a score of 6. However, it should be worth noting that this number is only an estimate. For example, if two features both had a FP value of 2 while one of the features had a higher TP value, that feature would be ranked higher than the other because the resulting estimate when using the equation would give a higher value. This could lead to some inaccuracies when using the metric if some features have more positive samples than others. To combat this, one could use a more powerful metric known as [[Sensitivity and specificity|Sensitivity]] that takes into account the proportions of the values from the confusion matrix to give the actual [[Sensitivity and specificity|true positive rate]] (TPR). The difference between these metrics is shown in the example below: {| |+ |'''Feature A Confusion Matrix''' {| class="wikitable" ! {{diagonal split header|Actual Class|Predicted<br />Class}} !Cancer !Non-cancer |- !Cancer |8 |3 |- !Non-cancer |2 |5 |} |style="padding-left: 4em;" | '''Feature B Confusion Matrix''' {| class="wikitable" ! {{diagonal split header|Actual Class|Predicted<br />Class}} !Cancer !Non-cancer |- !Cancer |6 |2 |- !Non-cancer |2 |8 |} |- |<math>E_p = TP - FP = 8 - 2 = 6</math> <math>TPR = TP / (TP + FN) = 8 / (8 + 3) \approx 0.73 </math> |style="padding-left: 4em;" | <math>E_p = TP - FP = 6 - 2 = 4</math> <math>TPR = TP / (TP + FN) = 6 / (6 + 2) = 0.75 </math> |} In this example, Feature A had an estimate of 6 and a TPR of approximately 0.73 while Feature B had an estimate of 4 and a TPR of 0.75. This shows that although the positive estimate for some feature may be higher, the more accurate TPR value for that feature may be lower when compared to other features that have a lower positive estimate. Depending on the situation and knowledge of the data and decision trees, one may opt to use the positive estimate for a quick and easy solution to their problem. On the other hand, a more experienced user would most likely prefer to use the TPR value to rank the features because it takes into account the proportions of the data and all the samples that should have been classified as positive.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)