Editing Linear classifier (section)

===Discriminative training===
Discriminative training of linear classifiers usually proceeds in a [[supervised learning|supervised]] way, by means of an [[optimization algorithm]] that is given a training set with desired outputs and a [[loss function]] that measures the discrepancy between the classifier's outputs and the desired outputs. Thus, the learning algorithm solves an optimization problem of the form<ref name="ieee">{{cite journal |author1=Guo-Xun Yuan |author2=Chia-Hua Ho |author3=Chih-Jen Lin |title=Recent Advances of Large-Scale Linear Classification |journal=Proc. IEEE |volume=100 |issue=9 |year=2012|url=http://dmkd.cs.vt.edu/TUTORIAL/Bigdata/Papers/IEEE12.pdf |archive-url=https://web.archive.org/web/20170610105707/http://dmkd.cs.vt.edu/TUTORIAL/Bigdata/Papers/IEEE12.pdf |archive-date=2017-06-10 |url-status=live}}</ref>

:<math>\underset{\mathbf{w}}{\arg\min} \;R(\mathbf{w}) + C \sum_{i=1}^N L(y_i, \mathbf{w}^\mathsf{T} \mathbf{x}_i)</math>

where

* {{math|'''w'''}} is a vector of classifier parameters,
* {{math|''L''(''y<sub>i</sub>'', '''w'''<sup>T</sup>'''x'''<sub>''i''</sub>)}} is a loss function that measures the discrepancy between the classifier's prediction and the true output {{mvar|y<sub>i</sub>}} for the {{mvar|i}}'th training example,
* {{math|''R''('''w''')}} is a [[Regularization (mathematics)|regularization]] function that prevents the parameters from getting too large (causing [[overfitting]]), and
* {{mvar|C}} is a scalar constant (set by the user of the learning algorithm) that controls the balance between the regularization and the loss function.

Popular loss functions include the [[hinge loss]] (for linear SVMs) and the [[log loss]] (for linear logistic regression). If the regularization function {{mvar|R}} is [[convex function|convex]], then the above is a [[convex optimization|convex problem]].{{r|ieee}} Many algorithms exist for solving such problems; popular ones for linear classification include ([[Stochastic gradient descent|stochastic]]) [[gradient descent]], [[L-BFGS]], [[coordinate descent]] and [[Newton method]]s.