Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Decision tree
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
===Increasing the number of levels of the tree=== The [[Accuracy and precision|accuracy]] of the decision tree can change based on the depth of the decision tree. In many cases, the treeβs leaves are [[Gini impurity|pure]] nodes.<ref>{{Cite book|last=Larose|first=Chantal, Daniel|title=Discovering Knowledge in Data|publisher=John Wiley & Sons|year=2014|isbn=9780470908747|location=Hoboken, NJ|pages=167|language=English}}</ref> When a node is pure, it means that all the data in that node belongs to a single class.<ref>{{Cite web|last=Plapinger|first=Thomas|date=Jul 29, 2017|title=What is a Decision Tree?|url=https://towardsdatascience.com/what-is-a-decision-tree-22975f00f3e1|url-status=live|access-date=5 December 2021|website=Towards Data Science|archive-url=https://web.archive.org/web/20211210231954/https://towardsdatascience.com/what-is-a-decision-tree-22975f00f3e1 |archive-date=10 December 2021 }}</ref> For example, if the classes in the data set are Cancer and Non-Cancer a leaf node would be considered pure when all the sample data in a leaf node is part of only one class, either cancer or non-cancer. It is important to note that a deeper tree is not always better when optimizing the decision tree. A deeper tree can influence the runtime in a negative way. If a certain classification algorithm is being used, then a deeper tree could mean the runtime of this classification algorithm is significantly slower. There is also the possibility that the actual algorithm building the decision tree will get significantly slower as the tree gets deeper. If the tree-building algorithm being used splits pure nodes, then a decrease in the overall accuracy of the tree classifier could be experienced. Occasionally, going deeper in the tree can cause an accuracy decrease in general, so it is very important to test modifying the depth of the decision tree and selecting the depth that produces the best results. To summarize, observe the points below, we will define the number D as the depth of the tree. Possible advantages of increasing the number D: * Accuracy of the decision-tree classification model increases. Possible disadvantages of increasing D * Runtime issues * Decrease in accuracy in general * Pure node splits while going deeper can cause issues. The ability to test the differences in classification results when changing D is imperative. We must be able to easily change and test the variables that could affect the accuracy and reliability of the decision tree-model.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)