Editing Machine learning (section)

==== Feature learning ====
{{Main|Feature learning}}

Several learning algorithms aim at discovering better representations of the inputs provided during training.<ref name="pami">{{cite journal |author1=Y. Bengio |author2=A. Courville |author3=P. Vincent |title=Representation Learning: A Review and New Perspectives |journal= IEEE Transactions on Pattern Analysis and Machine Intelligence|year=2013|doi=10.1109/tpami.2013.50 |pmid=23787338 |volume=35 |issue=8 |pages=1798–1828|arxiv=1206.5538 |s2cid=393948 }}</ref> Classic examples include [[principal component analysis]] and cluster analysis. Feature learning algorithms, also called representation learning algorithms, often attempt to preserve the information in their input but also transform it in a way that makes it useful, often as a pre-processing step before performing classification or predictions. This technique allows reconstruction of the inputs coming from the unknown data-generating distribution, while not being necessarily faithful to configurations that are implausible under that distribution. This replaces manual [[feature engineering]], and allows a machine to both learn the features and use them to perform a specific task.

Feature learning can be either supervised or unsupervised. In supervised feature learning, features are learned using labelled input data. Examples include [[artificial neural network]]s, [[multilayer perceptron]]s, and supervised [[dictionary learning]]. In unsupervised feature learning, features are learned with unlabelled input data.  Examples include dictionary learning, [[independent component analysis]], [[autoencoder]]s, [[matrix decomposition|matrix factorisation]]<ref>{{cite conference |author1=Nathan Srebro |author2=Jason D. M. Rennie |author3=Tommi S. Jaakkola |title=Maximum-Margin Matrix Factorization |conference=[[Conference on Neural Information Processing Systems|NIPS]] |year=2004}}</ref> and various forms of [[Cluster analysis|clustering]].<ref name="coates2011">{{cite conference
|last1 = Coates
|first1 = Adam
|last2 = Lee
|first2 = Honglak
|last3 = Ng
|first3 = Andrew Y.
|title = An analysis of single-layer networks in unsupervised feature learning
|conference = Int'l Conf. on AI and Statistics (AISTATS)
|year = 2011
|url = http://machinelearning.wustl.edu/mlpapers/paper_files/AISTATS2011_CoatesNL11.pdf
|access-date = 25 November 2018
|archive-url = https://web.archive.org/web/20170813153615/http://machinelearning.wustl.edu/mlpapers/paper_files/AISTATS2011_CoatesNL11.pdf
|archive-date = 13 August 2017
}}</ref><ref>{{cite conference|last1 = Csurka|first1 = Gabriella|last2 = Dance|first2 = Christopher C.|last3 = Fan|first3 = Lixin|last4 = Willamowski|first4 = Jutta|last5 = Bray|first5 = Cédric|title = Visual categorization with bags of keypoints|conference = ECCV Workshop on Statistical Learning in Computer Vision|year = 2004|url = https://www.cs.cmu.edu/~efros/courses/LBMV07/Papers/csurka-eccv-04.pdf|access-date = 29 August 2019|archive-date = 13 July 2019|archive-url = https://web.archive.org/web/20190713040210/http://www.cs.cmu.edu/~efros/courses/LBMV07/Papers/csurka-eccv-04.pdf|url-status = live}}</ref><ref name="jurafsky">{{cite book |title=Speech and Language Processing |author1=Daniel Jurafsky |author2=James H. Martin |publisher=Pearson Education International |year=2009 |pages=145–146}}</ref>

[[Manifold learning]] algorithms attempt to do so under the constraint that the learned representation is low-dimensional. [[Sparse coding]] algorithms attempt to do so under the constraint that the learned representation is sparse, meaning that the mathematical model has many zeros. [[Multilinear subspace learning]] algorithms aim to learn low-dimensional representations directly from [[tensor]] representations for multidimensional data, without reshaping them into higher-dimensional vectors.<ref>{{cite journal |first1=Haiping |last1=Lu |first2=K.N. |last2=Plataniotis |first3=A.N. |last3=Venetsanopoulos |url=http://www.dsp.utoronto.ca/~haiping/Publication/SurveyMSL_PR2011.pdf |title=A Survey of Multilinear Subspace Learning for Tensor Data |journal=Pattern Recognition |volume=44 |number=7 |pages=1540–1551 |year=2011 |doi=10.1016/j.patcog.2011.01.004 |bibcode=2011PatRe..44.1540L |access-date=4 September 2015 |archive-date=10 July 2019 |archive-url=https://web.archive.org/web/20190710225429/http://www.dsp.utoronto.ca/~haiping/Publication/SurveyMSL_PR2011.pdf |url-status=live }}</ref> [[Deep learning]] algorithms discover multiple levels of representation, or a hierarchy of features, with higher-level, more abstract features defined in terms of (or generating) lower-level features. It has been argued that an intelligent machine is one that learns a representation that disentangles the underlying factors of variation that explain the observed data.<ref>{{cite book | title = Learning Deep Architectures for AI | author = Yoshua Bengio | publisher = Now Publishers Inc. | year = 2009 | isbn = 978-1-60198-294-0 | pages = 1–3 | url = https://books.google.com/books?id=cq5ewg7FniMC&pg=PA3 | author-link = Yoshua Bengio | access-date = 15 February 2016 | archive-date = 17 January 2023 | archive-url = https://web.archive.org/web/20230117053339/https://books.google.com/books?id=cq5ewg7FniMC&pg=PA3 | url-status = live }}</ref>

Feature learning is motivated by the fact that machine learning tasks such as classification often require input that is mathematically and computationally convenient to process. However, real-world data such as images, video, and sensory data has not yielded attempts to algorithmically define specific features. An alternative is to discover such features or representations through examination, without relying on explicit algorithms.