Editing Support vector machine (section)

=== Hard-margin ===
If the training data is [[linearly separable]], we can select two parallel hyperplanes that separate the two classes of data, so that the distance between them is as large as possible. The region bounded by these two hyperplanes is called the "margin", and the maximum-margin hyperplane is the hyperplane that lies halfway between them. With a normalized or standardized dataset, these hyperplanes can be described by the equations
: <math>\mathbf{w}^\mathsf{T} \mathbf{x} - b = 1</math> (anything on or above this boundary is of one class, with label 1)
and
: <math>\mathbf{w}^\mathsf{T} \mathbf{x} - b = -1</math> (anything on or below this boundary is of the other class, with label −1).
Geometrically, the distance between these two hyperplanes is <math>\tfrac{2}{\|\mathbf{w}\|}</math>,<ref>{{cite web |url=https://math.stackexchange.com/q/1305925/168764 |title=Why is the SVM margin equal to <math>\frac{2}{\|\mathbf{w}\|}</math> |author= |date=30 May 2015 |website=Mathematics Stack Exchange}}</ref> so to maximize the distance between the planes we want to minimize <math>\|\mathbf{w}\|</math>. The distance is computed using the [[distance from a point to a plane]] equation. We also have to prevent data points from falling into the margin, we add the following constraint: for each <math>i</math> either
<math display="block">\mathbf{w}^\mathsf{T} \mathbf{x}_i - b \ge 1 \, , \text{ if } y_i = 1,</math>
or
<math display="block">\mathbf{w}^\mathsf{T} \mathbf{x}_i - b \le -1 \, , \text{ if } y_i = -1.</math>
These constraints state that each data point must lie on the correct side of the margin.

This can be rewritten as
{{NumBlk||<math display="block">y_i(\mathbf{w}^\mathsf{T} \mathbf{x}_i - b) \ge 1, \quad \text{ for all } 1 \le i \le n.</math>|{{EquationRef|1}}}}
We can put this together to get the optimization problem:

<math>\begin{align}
&\underset{\mathbf{w},\;b}{\operatorname{minimize}} && \frac{1}{2}\|\mathbf{w}\|^2\\
&\text{subject to} && y_i(\mathbf{w}^\top \mathbf{x}_i - b) \geq 1 \quad \forall i \in \{1,\dots,n\}
\end{align}</math>

The <math>\mathbf{w}</math> and <math>b</math> that solve this problem determine the final classifier, <math>\mathbf{x} \mapsto \sgn(\mathbf{w}^\mathsf{T} \mathbf{x} - b)</math>, where <math>\sgn(\cdot)</math> is the [[sign function]].

An important consequence of this geometric description is that the max-margin hyperplane is completely determined by those <math>\mathbf{x}_i</math> that lie nearest to it (explained below). These <math>\mathbf{x}_i</math> are called ''support vectors''.{{anchor|Support vectors}}