Editing Support vector machine (section)

== Implementation ==

The parameters of the maximum-margin hyperplane are derived by solving the optimization. There exist several specialized algorithms for quickly solving the [[quadratic programming]] (QP) problem that arises from SVMs, mostly relying on heuristics for breaking the problem down into smaller, more manageable chunks.

Another approach is to use an [[interior-point method]] that uses [[Newton's method|Newton]]-like iterations to find a solution of the [[Karush–Kuhn–Tucker conditions]] of the primal and dual problems.<ref>{{Cite journal |doi=10.1137/S1052623400374379 |title=Interior-Point Methods for Massive Support Vector Machines |journal=SIAM Journal on Optimization |volume=13 |issue=3 |pages=783–804 |year=2002 |last1=Ferris |first1=Michael C. |last2=Munson |first2=Todd S. |url=http://www.cs.wisc.edu/~ferris/papers/siopt-svm.pdf |url-status=live |archive-url=https://web.archive.org/web/20081204224416/http://www.cs.wisc.edu/~ferris/papers/siopt-svm.pdf |archive-date=2008-12-04 |citeseerx=10.1.1.216.6893 |s2cid=13563302 }}</ref>
Instead of solving a sequence of broken-down problems, this approach directly solves the problem altogether. To avoid solving a linear system involving the large kernel matrix, a low-rank approximation to the matrix is often used in the kernel trick.

Another common method is Platt's [[sequential minimal optimization]] (SMO) algorithm, which breaks the problem down into 2-dimensional sub-problems that are solved analytically, eliminating the need for a numerical optimization algorithm and matrix storage. This algorithm is conceptually simple, easy to implement, generally faster, and has better scaling properties for difficult SVM problems.<ref>{{cite conference |first=John C. |last=Platt |title=Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines |conference=NIPS |year=1998 |url=http://research.microsoft.com/pubs/69644/tr-98-14.pdf |url-status=live |archive-url=https://web.archive.org/web/20150702075055/http://research.microsoft.com/pubs/69644/tr-98-14.pdf |archive-date=2015-07-02 }}</ref>

The special case of linear support vector machines can be solved more efficiently by the same kind of algorithms used to optimize its close cousin, [[logistic regression]]; this class of algorithms includes [[Stochastic gradient descent|sub-gradient descent]] (e.g., PEGASOS<ref>{{cite conference |first1=Shai |last1=Shalev-Shwartz |first2=Yoram |last2=Singer |first3=Nathan |last3=Srebro |title=Pegasos: Primal Estimated sub-GrAdient SOlver for SVM |conference=ICML |year=2007 |url=http://ttic.uchicago.edu/~shai/papers/ShalevSiSr07.pdf |url-status=live |archive-url=https://web.archive.org/web/20131215051700/http://ttic.uchicago.edu/~shai/papers/ShalevSiSr07.pdf |archive-date=2013-12-15 }}</ref>) and [[coordinate descent]] (e.g., LIBLINEAR<ref>{{cite journal |first1=Rong-En |last1=Fan |first2=Kai-Wei |last2=Chang |first3=Cho-Jui |last3=Hsieh |first4=Xiang-Rui |last4=Wang |first5=Chih-Jen |last5=Lin |title=LIBLINEAR: A library for large linear classification |url=https://www.csie.ntu.edu.tw/~cjlin/papers/liblinear.pdf |journal=[[Journal of Machine Learning Research]] |volume=9 |pages=1871–1874 |year=2008 }}</ref>). LIBLINEAR has some attractive training-time properties. Each convergence iteration takes time linear in the time taken to read the train data, and the iterations also have a [[Rate of convergence|Q-linear convergence]] property, making the algorithm extremely fast.

The general kernel SVMs can also be solved more efficiently using [[Stochastic gradient descent|sub-gradient descent]] (e.g. P-packSVM<ref>{{cite conference |first1=Zeyuan |last1=Allen Zhu |first2=Weizhu |last2=Chen |first3=Gang |last3=Wang |first4=Chenguang |last4=Zhu |first5=Zheng |last5=Chen |title=P-packSVM: Parallel Primal grAdient desCent Kernel SVM |conference=ICDM |year=2009 |url=http://people.csail.mit.edu/zeyuan/paper/2009-ICDM-Parallel.pdf |url-status=live |archive-url=https://web.archive.org/web/20140407095709/http://people.csail.mit.edu/zeyuan/paper/2009-ICDM-Parallel.pdf |archive-date=2014-04-07 }}</ref>), especially when [[parallelization]] is allowed.

Kernel SVMs are available in many machine-learning toolkits, including [[LIBSVM]], [[MATLAB]], [http://support.sas.com/documentation/cdl/en/whatsnew/64209/HTML/default/viewer.htm#emdocwhatsnew71.htm SAS], SVMlight, [https://cran.r-project.org/package=kernlab kernlab], [[scikit-learn]], [[Shogun (toolbox)|Shogun]], [[Weka (machine learning)|Weka]], [http://image.diku.dk/shark/ Shark], [https://mloss.org/software/view/409/ JKernelMachines], [[OpenCV]] and others.

Preprocessing of data (standardization) is highly recommended to enhance accuracy of classification.<ref>{{Cite journal| volume = 9| issue = Aug| pages = 1871–1874| last1 = Fan| first1 = Rong-En| last2 = Chang| first2 = Kai-Wei| last3 = Hsieh| first3 = Cho-Jui| last4 = Wang| first4 = Xiang-Rui| last5 = Lin| first5 = Chih-Jen| title = LIBLINEAR: A library for large linear classification| journal = Journal of Machine Learning Research| date = 2008}}</ref> There are a few methods of standardization, such as min-max, normalization by decimal scaling, Z-score.<ref>{{Cite journal| doi = 10.19026/rjaset.6.3638| volume = 6| pages = 3299–3303| last1 = Mohamad| first1 = Ismail| last2 = Usman| first2 = Dauda| title = Standardization and Its Effects on K-Means Clustering Algorithm| journal = Research Journal of Applied Sciences, Engineering and Technology| date = 2013-09-01| issue = 17| doi-access = free}}</ref> Subtraction of mean and division by variance of each feature is usually used for SVM.<ref>{{Cite journal| doi = 10.1140/epjds/s13688-019-0201-0| volume = 8| last1 = Fennell| first1 = Peter| last2 = Zuo| first2 = Zhiya| last3 = Lerman| first3 = Kristina|author3-link=Kristina Lerman| title = Predicting and explaining behavioral data with structured feature space decomposition| journal = EPJ Data Science| date = 2019-12-01| doi-access = free| arxiv = 1810.09841}}</ref>