Editing Feature selection (section)

{{Short description|Process in machine learning and statistics}}
{{More footnotes needed|date=July 2010}}
{{Distinguish|Feature extraction}}

In machine learning, '''feature selection''' is the process of selecting a subset of relevant [[Feature (machine learning)|features]] (variables, predictors) for use in model construction. Feature selection techniques are used for several reasons:

* simplification of models to make them easier to interpret,<ref name="islr">{{cite book |author1=Gareth James |author2=Daniela Witten |author3=Trevor Hastie |author4=Robert Tibshirani |title=An Introduction to Statistical Learning |publisher=Springer |year=2013 |url=http://www-bcf.usc.edu/~gareth/ISL/ |page=204}}</ref>
* shorter training times,<ref>{{Citation|last1=Brank|first1=Janez|title=Feature Selection|date=2011|url=http://link.springer.com/10.1007/978-0-387-30164-8_306|encyclopedia=Encyclopedia of Machine Learning|pages=402–406|editor-last=Sammut|editor-first=Claude|place=Boston, MA|publisher=Springer US|language=en|doi=10.1007/978-0-387-30164-8_306|isbn=978-0-387-30768-8|access-date=2021-07-13|last2=Mladenić|first2=Dunja|last3=Grobelnik|first3=Marko|last4=Liu|first4=Huan|last5=Mladenić|first5=Dunja|last6=Flach|first6=Peter A.|last7=Garriga|first7=Gemma C.|last8=Toivonen|first8=Hannu|last9=Toivonen|first9=Hannu|editor2-last=Webb|editor2-first=Geoffrey I.|url-access=subscription}}</ref>
* to avoid the [[curse of dimensionality]],<ref>{{Cite journal|last=Kramer|first=Mark A.|date=1991|title=Nonlinear principal component analysis using autoassociative neural networks|url=https://aiche.onlinelibrary.wiley.com/doi/abs/10.1002/aic.690370209|journal=AIChE Journal|language=en|volume=37|issue=2|pages=233–243|doi=10.1002/aic.690370209|bibcode=1991AIChE..37..233K |issn=1547-5905|url-access=subscription}}</ref>
* improve the compatibility of the data with a certain learning model class,<ref>{{Cite journal|last1=Kratsios|first1=Anastasis|last2=Hyndman|first2=Cody|date=2021|title=NEU: A Meta-Algorithm for Universal UAP-Invariant Feature Representation|url=http://jmlr.org/papers/v22/18-803.html|journal=Journal of Machine Learning Research|volume=22|issue=92|pages=1–51|issn=1533-7928}}</ref>
* to encode inherent [[Symmetric space|symmetries]] present in the input space.<ref>{{Cite book|last1=Persello|first1=Claudio|last2=Bruzzone|first2=Lorenzo|title=2014 IEEE Geoscience and Remote Sensing Symposium |chapter=Relevant and invariant feature selection of hyperspectral images for domain generalization |date=July 2014|chapter-url=http://dx.doi.org/10.1109/igarss.2014.6947252|pages=3562–3565|publisher=IEEE|doi=10.1109/igarss.2014.6947252|isbn=978-1-4799-5775-0|s2cid=8368258|url=https://ris.utwente.nl/ws/files/122945513/Persello2014relevant.pdf }}</ref><ref>{{Cite book|last1=Hinkle|first1=Jacob|last2=Muralidharan|first2=Prasanna|last3=Fletcher|first3=P. Thomas|last4=Joshi|first4=Sarang|title=Computer Vision – ECCV 2012 |chapter=Polynomial Regression on Riemannian Manifolds |date=2012|editor-last=Fitzgibbon|editor-first=Andrew|editor2-last=Lazebnik|editor2-first=Svetlana|editor3-last=Perona|editor3-first=Pietro|editor4-last=Sato|editor4-first=Yoichi|editor5-last=Schmid|editor5-first=Cordelia|chapter-url=https://link.springer.com/chapter/10.1007/978-3-642-33712-3_1|series=Lecture Notes in Computer Science|volume=7574|language=en|location=Berlin, Heidelberg|publisher=Springer|pages=1–14|doi=10.1007/978-3-642-33712-3_1|isbn=978-3-642-33712-3|arxiv=1201.2395|s2cid=8849753}}</ref><ref>{{Cite journal|last=Yarotsky|first=Dmitry|date=2021-04-30|title=Universal Approximations of Invariant Maps by Neural Networks|url=https://doi.org/10.1007/s00365-021-09546-1|journal=Constructive Approximation|volume=55 |pages=407–474 |language=en|doi=10.1007/s00365-021-09546-1|issn=1432-0940|arxiv=1804.10306|s2cid=13745401}}</ref><ref>{{Cite journal|last1=Hauberg|first1=Søren|last2=Lauze|first2=François|last3=Pedersen|first3=Kim Steenstrup|date=2013-05-01|title=Unscented Kalman Filtering on Riemannian Manifolds|url=https://doi.org/10.1007/s10851-012-0372-9|journal=Journal of Mathematical Imaging and Vision|language=en|volume=46|issue=1|pages=103–120|doi=10.1007/s10851-012-0372-9|bibcode=2013JMIV...46..103H |s2cid=8501814|issn=1573-7683|url-access=subscription}}</ref>

The central premise when using feature selection is that data sometimes contains features that are ''redundant'' or ''irrelevant'', and can thus be removed without incurring much loss of information.<ref name="Bermingham-prolog">{{cite journal|last1=Kratsios|first1=Anastasis|last2=Hyndman|first2=Cody|date=June 8, 2021|title=NEU: A Meta-Algorithm for Universal UAP-Invariant Feature Representation|url=https://jmlr.org/papers/v22/18-803.html|journal=[[Journal of Machine Learning Research]]|volume=22|page=10312|doi=10.1038/srep10312 |pmid=25988841 |pmc=4437376 |bibcode=2015NatSR...510312B}}</ref> Redundancy and irrelevance are two distinct notions, since one relevant feature may be redundant in the presence of another relevant feature with which it is strongly correlated.{{r|guyon-intro}}

[[Feature extraction]] creates new features from functions of the original features, whereas feature selection finds a subset of the features. Feature selection techniques are often used in domains where there are many features and comparatively few samples (data points).