Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Support vector machine
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Implementation == The parameters of the maximum-margin hyperplane are derived by solving the optimization. There exist several specialized algorithms for quickly solving the [[quadratic programming]] (QP) problem that arises from SVMs, mostly relying on heuristics for breaking the problem down into smaller, more manageable chunks. Another approach is to use an [[interior-point method]] that uses [[Newton's method|Newton]]-like iterations to find a solution of the [[Karush–Kuhn–Tucker conditions]] of the primal and dual problems.<ref>{{Cite journal |doi=10.1137/S1052623400374379 |title=Interior-Point Methods for Massive Support Vector Machines |journal=SIAM Journal on Optimization |volume=13 |issue=3 |pages=783–804 |year=2002 |last1=Ferris |first1=Michael C. |last2=Munson |first2=Todd S. |url=http://www.cs.wisc.edu/~ferris/papers/siopt-svm.pdf |url-status=live |archive-url=https://web.archive.org/web/20081204224416/http://www.cs.wisc.edu/~ferris/papers/siopt-svm.pdf |archive-date=2008-12-04 |citeseerx=10.1.1.216.6893 |s2cid=13563302 }}</ref> Instead of solving a sequence of broken-down problems, this approach directly solves the problem altogether. To avoid solving a linear system involving the large kernel matrix, a low-rank approximation to the matrix is often used in the kernel trick. Another common method is Platt's [[sequential minimal optimization]] (SMO) algorithm, which breaks the problem down into 2-dimensional sub-problems that are solved analytically, eliminating the need for a numerical optimization algorithm and matrix storage. This algorithm is conceptually simple, easy to implement, generally faster, and has better scaling properties for difficult SVM problems.<ref>{{cite conference |first=John C. |last=Platt |title=Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines |conference=NIPS |year=1998 |url=http://research.microsoft.com/pubs/69644/tr-98-14.pdf |url-status=live |archive-url=https://web.archive.org/web/20150702075055/http://research.microsoft.com/pubs/69644/tr-98-14.pdf |archive-date=2015-07-02 }}</ref> The special case of linear support vector machines can be solved more efficiently by the same kind of algorithms used to optimize its close cousin, [[logistic regression]]; this class of algorithms includes [[Stochastic gradient descent|sub-gradient descent]] (e.g., PEGASOS<ref>{{cite conference |first1=Shai |last1=Shalev-Shwartz |first2=Yoram |last2=Singer |first3=Nathan |last3=Srebro |title=Pegasos: Primal Estimated sub-GrAdient SOlver for SVM |conference=ICML |year=2007 |url=http://ttic.uchicago.edu/~shai/papers/ShalevSiSr07.pdf |url-status=live |archive-url=https://web.archive.org/web/20131215051700/http://ttic.uchicago.edu/~shai/papers/ShalevSiSr07.pdf |archive-date=2013-12-15 }}</ref>) and [[coordinate descent]] (e.g., LIBLINEAR<ref>{{cite journal |first1=Rong-En |last1=Fan |first2=Kai-Wei |last2=Chang |first3=Cho-Jui |last3=Hsieh |first4=Xiang-Rui |last4=Wang |first5=Chih-Jen |last5=Lin |title=LIBLINEAR: A library for large linear classification |url=https://www.csie.ntu.edu.tw/~cjlin/papers/liblinear.pdf |journal=[[Journal of Machine Learning Research]] |volume=9 |pages=1871–1874 |year=2008 }}</ref>). LIBLINEAR has some attractive training-time properties. Each convergence iteration takes time linear in the time taken to read the train data, and the iterations also have a [[Rate of convergence|Q-linear convergence]] property, making the algorithm extremely fast. The general kernel SVMs can also be solved more efficiently using [[Stochastic gradient descent|sub-gradient descent]] (e.g. P-packSVM<ref>{{cite conference |first1=Zeyuan |last1=Allen Zhu |first2=Weizhu |last2=Chen |first3=Gang |last3=Wang |first4=Chenguang |last4=Zhu |first5=Zheng |last5=Chen |title=P-packSVM: Parallel Primal grAdient desCent Kernel SVM |conference=ICDM |year=2009 |url=http://people.csail.mit.edu/zeyuan/paper/2009-ICDM-Parallel.pdf |url-status=live |archive-url=https://web.archive.org/web/20140407095709/http://people.csail.mit.edu/zeyuan/paper/2009-ICDM-Parallel.pdf |archive-date=2014-04-07 }}</ref>), especially when [[parallelization]] is allowed. Kernel SVMs are available in many machine-learning toolkits, including [[LIBSVM]], [[MATLAB]], [http://support.sas.com/documentation/cdl/en/whatsnew/64209/HTML/default/viewer.htm#emdocwhatsnew71.htm SAS], SVMlight, [https://cran.r-project.org/package=kernlab kernlab], [[scikit-learn]], [[Shogun (toolbox)|Shogun]], [[Weka (machine learning)|Weka]], [http://image.diku.dk/shark/ Shark], [https://mloss.org/software/view/409/ JKernelMachines], [[OpenCV]] and others. Preprocessing of data (standardization) is highly recommended to enhance accuracy of classification.<ref>{{Cite journal| volume = 9| issue = Aug| pages = 1871–1874| last1 = Fan| first1 = Rong-En| last2 = Chang| first2 = Kai-Wei| last3 = Hsieh| first3 = Cho-Jui| last4 = Wang| first4 = Xiang-Rui| last5 = Lin| first5 = Chih-Jen| title = LIBLINEAR: A library for large linear classification| journal = Journal of Machine Learning Research| date = 2008}}</ref> There are a few methods of standardization, such as min-max, normalization by decimal scaling, Z-score.<ref>{{Cite journal| doi = 10.19026/rjaset.6.3638| volume = 6| pages = 3299–3303| last1 = Mohamad| first1 = Ismail| last2 = Usman| first2 = Dauda| title = Standardization and Its Effects on K-Means Clustering Algorithm| journal = Research Journal of Applied Sciences, Engineering and Technology| date = 2013-09-01| issue = 17| doi-access = free}}</ref> Subtraction of mean and division by variance of each feature is usually used for SVM.<ref>{{Cite journal| doi = 10.1140/epjds/s13688-019-0201-0| volume = 8| last1 = Fennell| first1 = Peter| last2 = Zuo| first2 = Zhiya| last3 = Lerman| first3 = Kristina|author3-link=Kristina Lerman| title = Predicting and explaining behavioral data with structured feature space decomposition| journal = EPJ Data Science| date = 2019-12-01| doi-access = free| arxiv = 1810.09841}}</ref>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)