Editing Support vector machine (section)

== Properties ==

SVMs belong to a family of generalized [[linear classifier]]s and can be interpreted as an extension of the [[perceptron]].<ref>R. Collobert and S. Bengio (2004). Links between Perceptrons, MLPs and SVMs. Proc. Int'l Conf. on Machine Learning (ICML).</ref> They can also be considered a special case of [[Tikhonov regularization]]. A special property is that they simultaneously minimize the empirical ''classification error'' and maximize the ''geometric margin''; hence they are also known as '''maximum [[margin classifier]]s'''.

A comparison of the SVM to other classifiers has been made by Meyer, Leisch and Hornik.<ref>{{Cite journal |doi=10.1016/S0925-2312(03)00431-4 |title=The support vector machine under test |journal=Neurocomputing |volume=55 |issue=1–2 |pages=169–186 |date=September 2003 |last1=Meyer |first1=David |last2=Leisch |first2=Friedrich |last3=Hornik |first3=Kurt }}</ref>

=== Parameter selection ===

The effectiveness of SVM depends on the selection of kernel, the kernel's parameters, and soft margin parameter <math>\lambda</math>.
A common choice is a Gaussian kernel, which has a single parameter ''<math>\gamma</math>''. The best combination of <math>\lambda</math> and <math>\gamma</math> is often selected by a [[grid search]] with exponentially growing sequences of <math>\lambda</math> and ''<math>\gamma</math>'', for example, <math>\lambda \in \{ 2^{-5}, 2^{-3}, \dots, 2^{13},2^{15} \}</math>; <math>\gamma \in \{ 2^{-15},2^{-13}, \dots, 2^{1},2^{3} \}</math>. Typically, each combination of parameter choices is checked using [[Cross-validation (statistics)|cross validation]], and the parameters with best cross-validation accuracy are picked. Alternatively, recent work in [[Bayesian optimization]] can be used to select <math>\lambda</math> and ''<math>\gamma</math>'' , often requiring the evaluation of far fewer parameter combinations than grid search. The final model, which is used for testing and for classifying new data, is then trained on the whole training set using the selected parameters.<ref>{{cite tech report |url=http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf |title=A Practical Guide to Support Vector Classification |last1=Hsu |first1=Chih-Wei |last2=Chang |first2=Chih-Chung |last3=Lin |first3=Chih-Jen |name-list-style=amp |year=2003 |institution=Department of Computer Science and Information Engineering, National Taiwan University |url-status=live |archive-url=https://web.archive.org/web/20130625201224/http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf |archive-date=2013-06-25 }}</ref>

=== Issues ===
Potential drawbacks of the SVM include the following aspects:

* Requires full labeling of input data
* Uncalibrated [[class membership probabilities]]—SVM stems from Vapnik's theory which avoids estimating probabilities on finite data
* The SVM is only directly applicable for two-class tasks. Therefore, algorithms that reduce the multi-class task to several binary problems have to be applied; see the [[#Multiclass SVM|multi-class SVM]] section.
* Parameters of a solved model are difficult to interpret.