Editing Decision tree (section)

===Increasing the number of levels of the tree===

The [[Accuracy and precision|accuracy]] of the decision tree can change based on the depth of the decision tree. In many cases, the tree’s leaves are [[Gini impurity|pure]] nodes.<ref>{{Cite book|last=Larose|first=Chantal, Daniel|title=Discovering Knowledge in Data|publisher=John Wiley & Sons|year=2014|isbn=9780470908747|location=Hoboken, NJ|pages=167|language=English}}</ref> When a node is pure, it means that all the data in that node belongs to a single class.<ref>{{Cite web|last=Plapinger|first=Thomas|date=Jul 29, 2017|title=What is a Decision Tree?|url=https://towardsdatascience.com/what-is-a-decision-tree-22975f00f3e1|url-status=live|access-date=5 December 2021|website=Towards Data Science|archive-url=https://web.archive.org/web/20211210231954/https://towardsdatascience.com/what-is-a-decision-tree-22975f00f3e1 |archive-date=10 December 2021 }}</ref> For example, if the classes in the data set are Cancer and Non-Cancer a leaf node would be considered pure when all the sample data in a leaf node is part of only one class, either cancer or non-cancer. It is important to note that a deeper tree is not always better when optimizing the decision tree. A deeper tree can influence the runtime in a negative way. If a certain classification algorithm is being used, then a deeper tree could mean the runtime of this classification algorithm is significantly slower. There is also the possibility that the actual algorithm building the decision tree will get significantly slower as the tree gets deeper. If the tree-building algorithm being used splits pure nodes, then a decrease in the overall accuracy of the tree classifier could be experienced. Occasionally, going deeper in the tree can cause an accuracy decrease in general, so it is very important to test modifying the depth of the decision tree and selecting the depth that produces the best results. To summarize, observe the points below, we will define the number D as the depth of the tree.

Possible advantages of increasing the number D:

* Accuracy of the decision-tree classification model increases.

Possible disadvantages of increasing D

* &nbsp;Runtime issues
* Decrease in accuracy in general
* Pure node splits while going deeper can cause issues.

The ability to test the differences in classification results when changing D is imperative. We must be able to easily change and test the variables that could affect the accuracy and reliability of the decision tree-model.