Editing Supervised learning (section)

== Steps to follow ==

To solve a given problem of supervised learning, the following steps must be performed:

# Determine the type of training samples. Before doing anything else, the user should decide what kind of data is to be used as a [[Training, validation, and test data sets|training set]]. In the case of [[handwriting analysis]], for example, this might be a single handwritten character, an entire handwritten word, an entire sentence of handwriting, or a full paragraph of handwriting.
# Gather a training set. The training set needs to be representative of the real-world use of the function. Thus, a set of input objects is gathered together with corresponding outputs, either from [[Subject-matter expert|human experts]] or from measurements.
# Determine the input [[Feature (machine learning)|feature]] representation of the learned function. The accuracy of the learned function depends strongly on how the input object is represented. Typically, the input object is transformed into a [[feature vector]], which contains a number of features that are descriptive of the object. The number of features should not be too large, because of the [[curse of dimensionality]]; but should contain enough information to accurately predict the output.
# Determine the structure of the learned function and corresponding learning algorithm. For example, one may choose to use [[support-vector machine]]s or [[Decision tree learning|decision tree]]s.
# Complete the design. Run the learning algorithm on the gathered training set. Some supervised learning algorithms require the user to determine certain [[Hyperparameter (machine learning)|control parameters]]. These parameters may be adjusted by optimizing performance on a subset (called a ''[[validation set]]'') of the training set, or via [[Cross-validation (statistics)|cross-validation]].
# Evaluate the accuracy of the learned function. After parameter adjustment and learning, the performance of the resulting function should be measured on a [[test set]] that is separate from the training set.