Editing Decision tree (section)

== Evaluating a decision tree ==

It is important to know the measurements used to evaluate decision trees. The main metrics used are [[Accuracy and precision|accuracy]], [[Sensitivity and specificity|sensitivity]], [[Sensitivity and specificity|specificity]], [[Accuracy and precision|precision]], [[Sensitivity and specificity|miss rate]], [[false discovery rate]], and [[false omission rate]]. All these measurements are derived from the number of [[true positive]]s, [[False positives and false negatives|false positives]], [[True negative]]s, and [[False positives and false negatives|false negatives]] obtained when running a set of samples through the decision tree classification model. Also, a confusion matrix can be made to display these results. All these main metrics tell something different about the strengths and weaknesses of the classification model built based on your decision tree. For example, a low sensitivity with high specificity could indicate the classification model built from the decision tree does not do well identifying cancer samples over non-cancer samples.

Let us take the confusion matrix below. 
{| class="wikitable" style="text-align: right;"
|+
! {{diagonal split header|Actual|Predicted}}
! C
! NC
|-
! C
|11<br/>(true positives)
|45<br/>(false negatives)
|-
! NC
|1<br/>(false positive)
|105<br/>(true negatives)
|}

We will now calculate the values accuracy, sensitivity, specificity, precision, miss rate, false discovery rate, and false omission rate.

Accuracy:

<math display="block">\text{Accuracy} = (TP + TN)/(TP + TN + FP + FN)</math>

<math display="block">= (11+105)/ 162 = 71.60 \%</math>

Sensitivity (TPR – true positive rate):<ref>{{Cite web|title=False Positive Rate {{!}} Split Glossary|url=https://www.split.io/glossary/false-positive-rate/|access-date=2021-12-10|website=Split|language=en-US}}</ref>

<math display="block">\text{TPR} = TP/(TP + FN)</math>

<math display="block">= 11/(11+45) = 19.64 \%</math>

Specificity (TNR – true negative rate):

<math display="block">\text{TNR} = TN/(TN + FP)</math>

<math display="block">= 105/(105+1) = 99.06 \%</math>

Precision (PPV – positive predictive value):

<math display="block">\text{PPV} = TP/(TP + FP)</math>

<math display="block">= 11/(11+1) = 91.66 \%</math>

Miss Rate (FNR – false negative rate):

<math display="block">\text{FNR} = FN/(FN + TP)</math>

<math display="block">= 45/(45+11) = 80.35\%</math>

False discovery rate (FDR):

<math display="block">\text{FDR} = FP/(FP + TP)</math>

<math display="block">= 1/(1+11) = 8.30 \%</math>

False omission rate (FOR):

<math display="block">\text{FOR} = FN/(FN + TN)</math>

<math display="block">= 45/(45 + 105) = 30.00\%</math>

Once we have calculated the key metrics we can make some initial conclusions on the performance of the decision tree model built. The accuracy that we calculated was 71.60%. The accuracy value is good to start but we would like to get our models as accurate as possible while maintaining the overall performance. The sensitivity value of 19.64% means that out of everyone who was actually positive for cancer tested positive. If we look at the specificity value of 99.06% we know that out of all the samples that were negative for cancer actually tested negative. When it comes to sensitivity and specificity it is important to have a balance between the two values, so if we can decrease our specificity to increase the sensitivity that would prove to be beneficial.<ref>{{Cite web|title=Sensitivity vs Specificity|url=https://www.technologynetworks.com/analysis/articles/sensitivity-vs-specificity-318222|access-date=2021-12-10|website=Analysis & Separations from Technology Networks|language=en}}</ref> These are just a few examples on how to use these values and the meanings behind them to evaluate the decision tree model and improve upon the next iteration.