Editing Decision tree learning (section)

==General==
[[File:Decision Tree.jpg|thumb|A tree showing survival of passengers on the [[Titanic]] ("sibsp" is the number of spouses or siblings aboard). The figures under the leaves show the probability of survival and the percentage of observations in the leaf. Summarizing: Your chances of survival were good if you were (i) a female or (ii) a male at most 9.5 years old with strictly fewer than 3 siblings.]]
Decision tree learning is a method commonly used in data mining.<ref name="tdidt">{{Cite book
|last=Rokach
|first=Lior
|author2=Maimon, O.
|title=Data mining with decision trees: theory and applications, 2nd Edition
|year=2014
|publisher=World Scientific Pub Co Inc
|doi=10.1142/9097
|isbn=978-9814590075
|s2cid=44697571
}}</ref> The goal is to create a model that predicts the value of a target variable based on several input variables.

A decision tree is a simple representation for classifying examples. For this section, assume that all of the input [[Feature (machine learning)|feature]]s have finite discrete domains, and there is a single target feature called the "classification". Each element of the domain of the classification is called a ''class''.
A decision tree or a classification tree is a tree in which each internal (non-leaf) node is labeled with an input feature. The arcs coming from a node labeled with an input feature are labeled with each of the possible values of the target feature or the arc leads to a subordinate decision node on a different input feature. Each leaf of the tree is labeled with a class or a probability distribution over the classes, signifying that the data set has been classified by the tree into either a specific class, or into a particular probability distribution (which, if the decision tree is well-constructed, is skewed towards certain subsets of classes).

A tree is built by splitting the source [[Set (mathematics)|set]], constituting the root node of the tree, into subsets—which constitute the successor children. The splitting is based on a set of splitting rules based on classification features.<ref>{{Cite book|title=Understanding Machine Learning|last1=Shalev-Shwartz|first1=Shai|publisher=Cambridge University Press|url=http://www.cs.huji.ac.il/~shais/UnderstandingMachineLearning|last2=Ben-David|first2=Shai|date=2014|chapter=18. Decision Trees}}</ref>  This process is repeated on each derived subset in a recursive manner called [[recursive partitioning]].
The [[recursion]] is completed when the subset at a node has all the same values of the target variable, or when splitting no longer adds value to the predictions. This process of ''top-down induction of decision trees'' (TDIDT)<ref name="Quinlan86">{{Cite journal | url=https://link.springer.com/content/pdf/10.1007/BF00116251.pdf | doi=10.1007/BF00116251| title=Induction of decision trees| journal=Machine Learning| volume=1| pages=81–106| year=1986| last1=Quinlan| first1=J. R.| s2cid=189902138| doi-access=free}}</ref> is an example of a [[greedy algorithm]], and it is by far the most common strategy for learning decision trees from data.<ref name="top-downDT" />

In [[data mining]], decision trees can be described also as the combination of mathematical and computational techniques to aid the description, categorization and generalization of a given set of data.

Data comes in records of the form:

:<math>(\textbf{x},Y) = (x_1, x_2, x_3, ..., x_k, Y)</math>

The dependent variable, <math>Y</math>, is the target variable that we are trying to understand, classify or generalize. The vector <math>\textbf{x}</math> is composed of the features, <math>x_1, x_2, x_3</math> etc., that are used for that task.

[[File:Cart tree kyphosis.png|thumb|800px|
alt=Three different representations of a regression tree of kyphosis data|
An example tree which estimates the probability of
[[kyphosis]] after spinal surgery, given the age of the patient and the
vertebra at which surgery was started.
The same tree is shown in three different ways.
'''Left''' The colored leaves show the probability of kyphosis after spinal surgery,
and percentage of patients in the leaf.
'''Middle''' The tree as a perspective plot.
'''Right''' Aerial view of the middle plot.
The probability of kyphosis after surgery is higher in the darker areas.
(Note: The treatment of [[kyphosis]] has advanced considerably since this rather small set of data was collected.{{citation needed|date=December 2019}})
]]