Editing Linear separability (section)

== Support vector machines==
{{main|Support vector machine}}

[[Image:Svm separating hyperplanes (SVG).svg|thumb|right|H<sub>1</sub> does not separate the sets. H<sub>2</sub> does, but only with a small margin.  H<sub>3</sub> separates them with the maximum margin.]]
[[Statistical classification|Classifying data]] is a common task in [[machine learning]].
Suppose some data points, each belonging to one of two sets, are given and we wish to create a model that will decide which set a ''new'' data point will be in. In the case of [[support vector machine]]s, a data point is viewed as a ''p''-dimensional vector (a list of ''p'' numbers), and we want to know whether we can separate such points with a (''p''&nbsp;&minus;&nbsp;1)-dimensional [[hyperplane]]. This is called a [[linear classifier]]. There are many hyperplanes that might classify (separate) the data. One reasonable choice as the best hyperplane is the one that represents the largest separation, or margin, between the two sets. So we choose the hyperplane so that the distance from it to the nearest data point on each side is maximized. If such a hyperplane exists, it is known as the ''[[maximum-margin hyperplane]]'' and the linear classifier it defines is known as a ''maximum [[margin classifier]]''.

More formally, given some training data <math>\mathcal{D}</math>, a set of ''n'' points of the form

:<math>\mathcal{D} = \left\{ (\mathbf{x}_i, y_i)\mid\mathbf{x}_i \in \mathbb{R}^p,\, y_i \in \{-1,1\}\right\}_{i=1}^n</math>

where the ''y''<sub>''i''</sub> is either 1 or −1, indicating the set to which the point <math>\mathbf{x}_i </math> belongs. Each <math> \mathbf{x}_i </math> is a ''p''-dimensional [[real number|real]] vector. We want to find the maximum-margin hyperplane that divides the points having <math>y_i=1</math> from those having <math>y_i=-1</math>. Any hyperplane can be written as the set of points <math>\mathbf{x}</math> satisfying

: <math>\mathbf{w}\cdot\mathbf{x} - b=0,</math>

where <math>\cdot</math> denotes the [[dot product]] and <math>{\mathbf{w}}</math> the (not necessarily normalized) [[Normal (geometry)|normal vector]] to the hyperplane. The parameter <math>\tfrac{b}{\|\mathbf{w}\|}</math> determines the offset of the hyperplane from the origin along the normal vector <math>{\mathbf{w}}</math>.

If the training data are linearly separable, we can select two hyperplanes in such a way that they separate the data and there are no points between them, and then try to maximize their distance.