Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Decision tree learning
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
===Limitations=== * Trees can be very non-robust. A small change in the [[Training, test, and validation sets|training data]] can result in a large change in the tree and consequently the final predictions.<ref name=":0" /> * The problem of learning an optimal decision tree is known to be [[NP-complete]] under several aspects of optimality and even for simple concepts.<ref>{{Cite journal | doi = 10.1016/0020-0190(76)90095-8 | last1 = Hyafil | first1 = Laurent | last2 = Rivest | first2 = RL | year = 1976 | title = Constructing Optimal Binary Decision Trees is NP-complete | journal = Information Processing Letters | volume = 5 | issue = 1| pages = 15β17 }}</ref><ref>Murthy S. (1998). [https://cs.nyu.edu/~roweis/csc2515-2006/readings/murthy_dt.pdf "Automatic construction of decision trees from data: A multidisciplinary survey"]. ''Data Mining and Knowledge Discovery''</ref> Consequently, practical decision-tree learning algorithms are based on heuristics such as the [[greedy algorithm]] where locally optimal decisions are made at each node. Such algorithms cannot guarantee to return the globally optimal decision tree. To reduce the greedy effect of local optimality, some methods such as the dual information distance (DID) tree were proposed.<ref>{{cite journal|url=http://www.eng.tau.ac.il/~bengal/DID.pdf|title=Efficient Construction of Decision Trees by the Dual Information Distance Method|author=Ben-Gal I. Dana A., Shkolnik N. and Singer|journal=Quality Technology & Quantitative Management|volume=11|issue=1|pages=133β147|year=2014|doi=10.1080/16843703.2014.11673330|s2cid=7025979|access-date=2014-02-13|archive-date=2016-06-04|archive-url=https://web.archive.org/web/20160604183738/http://www.eng.tau.ac.il/~bengal/DID.pdf|url-status=dead}}</ref> * Decision-tree learners can create over-complex trees that do not generalize well from the training data. (This is known as [[overfitting]].<ref>{{Cite book | title = Principles of Data Mining | doi = 10.1007/978-1-84628-766-4 | year = 2007 | isbn = 978-1-84628-765-7 | s2cid = 45746 }}</ref>) Mechanisms such as [[Pruning (decision trees)|pruning]] are necessary to avoid this problem (with the exception of some algorithms such as the Conditional Inference approach, that does not require pruning).<ref name="Hothorn2006" /><ref name="Strobl2009" /> * The average depth of the tree that is defined by the number of nodes or tests till classification is not guaranteed to be minimal or small under various splitting criteria.<ref name="Tris">{{cite web|author = Ben-Gal I. and Trister C. (2015)|title = Parallel Construction of Decision Trees with Consistently Non Increasing Expected Number of Tests|url = http://www.eng.tau.ac.il/~bengal/Trist.pdf|publisher = Applied Stochastic Models in Business and Industry, Vol. 31(1) 64-78|access-date = 2021-01-30|archive-date = 2021-02-05|archive-url = https://web.archive.org/web/20210205043215/http://www.eng.tau.ac.il/~bengal/Trist.pdf|url-status = dead}}</ref> * For data including categorical variables with different numbers of levels, [[information gain in decision trees]] is biased in favor of attributes with more levels.<ref>{{cite conference|author=Deng, H.|author2=Runger, G. |author3=Tuv, E. |title=Bias of importance measures for multi-valued attributes and solutions|conference=Proceedings of the 21st International Conference on Artificial Neural Networks (ICANN)|year=2011|pages= 293β300|url=https://www.researchgate.net/publication/221079908}}</ref> To counter this problem, instead of choosing the attribute with highest [[information gain]], one can choose the attribute with the highest [[information gain ratio]] among the attributes whose information gain is greater than the mean information gain.<ref>{{cite journal |doi=10.1007/BF00116251 |last=Quinlan |first=J. Ross |title=Induction of Decision Trees |journal=[[Machine Learning (journal)|Machine Learning]] |volume=1 |issue=1 |year=1986 |pages=81β106 |doi-access=free }}</ref> This biases the decision tree against considering attributes with a large number of distinct values, while not giving an unfair advantage to attributes with very low information gain. Alternatively, the issue of biased predictor selection can be avoided by the Conditional Inference approach,<ref name="Hothorn2006" /> a two-stage approach,<ref>{{Cite journal|last1=Brandmaier|first1=Andreas M.|last2=Oertzen|first2=Timo von|last3=McArdle|first3=John J.|last4=Lindenberger|first4=Ulman|title=Structural equation model trees.|journal=Psychological Methods|language=en|volume=18|issue=1|pages=71β86|doi=10.1037/a0030001|pmid=22984789|pmc=4386908|year=2012|hdl=11858/00-001M-0000-0024-EA33-9}}</ref> or adaptive leave-one-out feature selection.<ref>{{cite journal|last1=Painsky|first1=Amichai|last2=Rosset|first2=Saharon|title=Cross-Validated Variable Selection in Tree-Based Methods Improves Predictive Performance|journal=IEEE Transactions on Pattern Analysis and Machine Intelligence|date=2017|volume=39|issue=11|pages=2142β2153|pmid=28114007|doi=10.1109/TPAMI.2016.2636831|arxiv=1512.03444|s2cid=5381516}}</ref>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)