Editing Decision tree (section)

===The choice of node-splitting functions===

The node splitting function used can have an impact on improving the accuracy of the decision tree. For example, using the [[Information gain in decision trees|information-gain]] function may yield better results than using the phi function. The phi function is known as a measure of “goodness” of a candidate split  at a node in the decision tree. The information gain function is known as a measure of the “reduction in [[Entropy (information theory)|entropy]]”. In the following, we will build two decision trees. One decision tree will be built using the phi function to split the nodes and one decision tree will be built using the information gain function to split the nodes.

The main advantages and disadvantages of [[Information gain in decision trees|information gain]] and  phi function

* One major drawback of information gain is that the feature that is chosen as the next node in the tree tends to have more unique values.<ref>{{Cite web|last=Tao|first=Christopher|date=Sep 6, 2020|title=Do Not Use Decision Tree Like Thus|url=https://towardsdatascience.com/do-not-use-decision-tree-like-this-369769d6104d|url-status=live|access-date=December 10, 2021|website=Towards Data Science|archive-url=https://web.archive.org/web/20211210231951/https://towardsdatascience.com/do-not-use-decision-tree-like-this-369769d6104d |archive-date=10 December 2021 }}</ref>
* An advantage of information gain is that it tends to choose the most impactful features that are close to the root of the tree. It is a very good measure for deciding the relevance of some features.
* The phi function is also a good measure for deciding the relevance of some features based on "goodness".
This is the information gain function formula. The formula states the information gain is a function of the entropy of a node of the decision tree minus the entropy of a candidate split at node t of a decision tree.

:<math>I_{\textrm{gain}}(s) = H(t) - H(s,t)</math>

This is the phi function formula. The phi function is maximized when the chosen feature splits the samples in a way that produces homogenous splits and have around the same number of samples in each split.

:<math>\Phi(s,t) = (2*P_L*P_R ) * Q(s|t)</math>

We will set D, which is the depth of the decision tree we are building, to three (D = 3). We also have the following data set of cancer and non-cancer samples and the mutation features that the samples either have or do not have. If a sample has a feature mutation then the sample is positive for that mutation, and it will be represented by one. If a sample does not have a feature mutation then the sample is negative for that mutation, and it will be represented by zero.

To summarize, C stands for cancer and NC stands for non-cancer. The letter M stands for [[mutation]], and if a sample has a particular mutation it will show up in the table as a one and otherwise zero.
{| class="wikitable"
|+The sample data
!
!M1
!M2
!M3
!M4
!M5
|-
|C1
|0
|1
|0
|1
|1
|-
|NC1
|0
|0
|0
|0
|0
|-
|NC2
|0
|0
|1
|1
|0
|-
|NC3
|0
|0
|0
|0
|0
|-
|C2
|1
|1
|1
|1
|1
|-
|NC4
|0
|0
|0
|1
|0
|}

Now, we can use the formulas to calculate the phi function values and information gain values for each M in the dataset. Once all the values are calculated the tree can be produced. The first thing to be done is to select the root node. In information gain and the phi function we consider the optimal split to be the mutation that produces the highest value for information gain or the phi function. Now assume that M1  has the highest phi function value and M4 has the highest information gain value. The M1 mutation will be the root of our phi function tree and M4 will be the root of our information gain tree. You can observe the root nodes below 
[[File:Realnode.jpg|center|frameless|220x220px|Figure 1: The left node is the root node of the tree we are building using the phi function to split the nodes. The right node is the root node of the tree we are building using information gain to split the nodes.]]

Now, once we have chosen the root node we can split the samples into two groups based on whether a sample is positive or negative for the root node mutation. The groups will be called group A and group B. For example, if we use M1 to split the samples in the root node we get NC2 and C2 samples in group A and the rest of the samples NC4, NC3, NC1, C1 in group B.

Disregarding the mutation chosen for the root node, proceed to place the next best features that have the highest values for information gain or the phi function in the left or right child nodes of the decision tree. Once we choose the root node and the two child nodes for the tree of depth = 3 we can just add the leaves. The leaves will represent the final classification decision the model has produced based on the mutations a sample either has or does not have. The left tree is the decision tree we obtain from using information gain to split the nodes and the right tree is what we obtain from using the phi function to split the nodes.

[[File:Information Gain Tree.jpg|left|frameless|500x500px|The resulting tree from using information gain to split the nodes]]
[[File:Phi Function Tree.jpg|frameless|500x500px|center]]

Now assume the [[classification]] results from both trees are given using a [[confusion matrix]].

Information gain confusion matrix:
{| class="wikitable"
|+
! {{diagonal split header|Actual|Predicted}}
! C
! NC
|-
! C
|1
|1
|-
! NC
|0
|4
|}

Phi function confusion matrix:
{| class="wikitable"
|+
! {{diagonal split header|Actual|Predicted}}
! C
! NC
|-
! C
|2
|0
|-
! NC
|1
|3
|}

The tree using information gain has  the same results when using the phi function when calculating the accuracy. When we classify the samples based on the model using information gain we get one true positive, one false positive, zero false negatives, and four true negatives. For the model using the phi function we get two true positives, zero false positives, one false negative, and three true negatives. The next step is to evaluate the effectiveness of the decision tree using some key metrics that will be discussed in the evaluating a decision tree section below. The metrics that will be discussed below can help determine the next steps to be taken when optimizing the decision tree.