Editing Support vector machine (section)

=== Soft-margin ===
To extend SVM to cases in which the data are not linearly separable, the ''[[hinge loss]]'' function is helpful
<math display="block">\max\left(0, 1 - y_i(\mathbf{w}^\mathsf{T} \mathbf{x}_i - b)\right).</math>

Note that <math>y_i</math> is the ''i''-th target (i.e., in this case, 1 or −1), and <math>\mathbf{w}^\mathsf{T} \mathbf{x}_i - b</math> is the ''i''-th output.

This function is zero if the constraint in {{EquationNote|1|(1)}} is satisfied, in other words, if <math>\mathbf{x}_i</math> lies on the correct side of the margin. For data on the wrong side of the margin, the function's value is proportional to the distance from the margin.

The goal of the optimization then is to minimize:

<math display="block"> \lVert \mathbf{w} \rVert^2 + C \left[\frac 1 n \sum_{i=1}^n \max\left(0, 1 - y_i(\mathbf{w}^\mathsf{T} \mathbf{x}_i - b)\right) \right],</math>

where the parameter <math>C > 0</math> determines the trade-off between increasing the margin size and ensuring that the <math>\mathbf{x}_i</math> lie on the correct side of the margin (Note we can add a weight to either term in the equation above). By deconstructing the hinge loss, this optimization problem can be formulated into the following:

<math display="block">\begin{align}
&\underset{\mathbf{w},\;b,\;\mathbf{\zeta}}{\operatorname{minimize}} &&\|\mathbf{w}\|_2^2 + C\sum_{i=1}^n \zeta_i\\
&\text{subject to} && y_i(\mathbf{w}^\top \mathbf{x}_i - b) \geq 1 - \zeta_i, \quad \zeta_i \geq 0 \quad \forall i\in \{1,\dots,n\}
\end{align}</math>

Thus, for large values of <math>C</math>, it will behave similar to the hard-margin SVM, if the input data are linearly classifiable, but will still learn if a classification rule is viable or not.