Editing Supervised learning (section)

===Other factors to consider===

Other factors to consider when choosing and applying a learning algorithm include the following:

* Heterogeneity of the data. If the feature vectors include features of many different kinds (discrete, discrete ordered, counts, continuous values), some algorithms are easier to apply than others. Many algorithms, including [[Support Vector Machines|support-vector machines]], [[linear regression]], [[logistic regression]], [[Neural network (machine learning)|neural networks]], and [[k-nearest neighbors algorithm|nearest neighbor methods]], require that the input features be numerical and scaled to similar ranges (e.g., to the [-1,1] interval). Methods that employ a distance function, such as nearest neighbor methods and [[Support Vector Machines|support-vector machines with Gaussian kernels]], are particularly sensitive to this. An advantage of [[Decision tree learning|decision trees]] is that they easily handle heterogeneous data.
* Redundancy in the data. If the input features contain redundant information (e.g., highly correlated features), some learning algorithms (e.g., [[linear regression]], [[logistic regression]], and [[k-nearest neighbors algorithm| distance-based methods]]) will perform poorly because of numerical instabilities. These problems can often be solved by imposing some form of [[Regularization (mathematics)|regularization]].
* Presence of interactions and non-linearities. If each of the features makes an independent contribution to the output, then algorithms based on linear functions (e.g., [[linear regression]], [[logistic regression]], [[support-vector machine]]s, [[Naive Bayes classifier|naive Bayes]]) and distance functions (e.g., nearest neighbor methods, [[Support Vector Machines|support-vector machines with Gaussian kernels]]) generally perform well. However, if there are complex interactions among features, then algorithms such as [[Decision tree learning|decision trees]] and neural networks work better, because they are specifically designed to discover these interactions. Linear methods can also be applied, but the engineer must manually specify the interactions when using them.

When considering a new application, the engineer can compare multiple learning algorithms and experimentally determine which one works best on the problem at hand (see [[Cross-validation (statistics)| cross-validation]]). Tuning the performance of a learning algorithm can be very time-consuming. Given fixed resources, it is often better to spend more time collecting additional training data and more informative features than it is to spend extra time tuning the learning algorithms.