Editing Decision tree learning (section)

===Measure of "goodness"===
Used by CART in 1984,<ref name="ll">{{Cite book
|last=Larose
|first=Daniel T.
|author2=Larose, Chantal D.
|title=Discovering knowledge in data: an introduction to data mining
|year=2014
|publisher=John Wiley & Sons, Inc
|location=Hoboken, NJ
|isbn=9781118874059
}}</ref> the measure of "goodness" is a function that seeks to optimize the balance of a candidate split's capacity to create pure children with its capacity to create equally-sized children. This process is repeated for each impure node until the tree is complete. The function <math>\varphi(s\mid t)</math>, where <math>s</math> is a candidate split at node <math>t</math>, is defined as below

:<math>
\varphi(s\mid t) = 2P_L P_R \sum_{j=1}^\text{class count}|P(j\mid t_L) - P(j\mid t_R)|
</math>

where <math>t_L</math> and <math>t_R</math> are the left and right children of node <math>t</math> using split <math>s</math>, respectively; <math>P_L</math> and <math>P_R</math> are the proportions of records in <math>t</math> in <math>t_L</math> and <math>t_R</math>, respectively; and <math>P(j\mid t_L)</math> and <math>P(j\mid t_R)</math> are the proportions of class <math>j</math> records in <math>t_L</math> and <math>t_R</math>, respectively.

Consider an example data set with three attributes: ''savings''(low, medium, high), ''assets''(low, medium, high), ''income''(numerical value), and a binary target variable ''credit risk''(good, bad) and 8 data points.<ref name="ll"/> The full data is presented in the table below. To start a decision tree, we will calculate the maximum value of <math>\varphi(s\mid t)</math> using each feature to find which one will split the root node. This process will continue until all children are pure or all <math>\varphi(s\mid t)</math> values are below a set threshold.

{| class="wikitable"
|-
! Customer !! Savings !! Assets !! Income ($1000s) !! Credit risk
|-
| 1 || Medium || High || 75 || Good
|-
| 2 || Low || Low || 50 || Bad
|-
| 3 || High || Medium || 25 || Bad
|-
| 4 || Medium || Medium || 50 || Good
|-
| 5 || Low || Medium || 100 || Good
|-
| 6 || High || High || 25 || Good
|-
| 7 || Low || Low || 25 || Bad
|-
| 8 || Medium || Medium || 75 || Good
|}

To find <math>\varphi(s\mid t)</math> of the feature ''savings'', we need to note the quantity of each value. The original data contained three low's, three medium's, and two high's. Out of the low's, one had a good ''credit risk'' while out of the medium's and high's, 4 had a good ''credit risk''. Assume a candidate split <math>s</math> such that records with a low ''savings'' will be put in the left child and all other records will be put into the right child.

:<math>
\varphi(s\mid\text{root}) = 2\cdot\frac 3 8\cdot\frac 5 8\cdot \left(\left|\left(\frac 1 3 - \frac 4 5\right)\right| + \left|\left(\frac 2 3 - \frac 1 5\right)\right|\right) = 0.44
</math>

To build the tree, the "goodness" of all candidate splits for the root node need to be calculated. The candidate with the maximum value will split the root node, and the process will continue for each impure node until the tree is complete.

Compared to other metrics such as information gain, the measure of "goodness" will attempt to create a more balanced tree, leading to more-consistent decision time. However, it sacrifices some priority for creating pure children which can lead to additional splits that are not present with other metrics.