Editing Machine learning (section)

== Relationships to other fields ==

=== Artificial intelligence ===
[[File:AI hierarchy.svg|thumb|Machine learning as subfield of AI<ref name="journalimcms.org">{{cite journal |vauthors=Sindhu V, Nivedha S, Prakash M |date=February 2020|title=An Empirical Science Research on Bioinformatics in Machine Learning |journal=Journal of Mechanics of Continua and Mathematical Sciences |issue=7 |doi=10.26782/jmcms.spl.7/2020.02.00006 |doi-access=free}}</ref>]]
As a scientific endeavour, machine learning grew out of the quest for [[artificial intelligence]] (AI). In the early days of AI as an [[Discipline (academia)|academic discipline]], some researchers were interested in having machines learn from data. They attempted to approach the problem with various symbolic methods, as well as what were then termed "[[Artificial neural network|neural network]]s"; these were mostly [[perceptron]]s and [[ADALINE|other models]] that were later found to be reinventions of the [[generalised linear model]]s of statistics.<ref>{{cite book |last1=Sarle |first1=Warren S.|chapter=Neural Networks and statistical models |pages=1538–50 |year=1994 |title=SUGI 19: proceedings of the Nineteenth Annual SAS Users Group International Conference |publisher=SAS Institute |isbn=9781555446116 |oclc=35546178}}</ref> [[Probabilistic reasoning]] was also employed, especially in [[automated medical diagnosis]].<ref name="aima">{{cite AIMA|edition=2}}</ref>{{rp|488}}

However, an increasing emphasis on the [[symbolic AI|logical, knowledge-based approach]] caused a rift between AI and machine learning. Probabilistic systems were plagued by theoretical and practical problems of data acquisition and representation.<ref name="aima" />{{rp|488}} By 1980, [[expert system]]s had come to dominate AI, and statistics was out of favour.<ref name="changing">{{Cite journal | last1 = Langley | first1 = Pat| title = The changing science of machine learning | doi = 10.1007/s10994-011-5242-y | journal = [[Machine Learning (journal)|Machine Learning]]| volume = 82 | issue = 3 | pages = 275–9 | year = 2011 | doi-access = free }}</ref> Work on symbolic/knowledge-based learning did continue within AI, leading to [[inductive logic programming]](ILP), but the more statistical line of research was now outside the field of AI proper, in [[pattern recognition]] and [[information retrieval]].<ref name="aima" />{{rp|708–710; 755}} Neural networks research had been abandoned by AI and [[computer science]] around the same time. This line, too, was continued outside the AI/CS field, as "[[connectionism]]", by researchers from other disciplines including [[John Hopfield]], [[David Rumelhart]], and [[Geoffrey Hinton]]. Their main success came in the mid-1980s with the reinvention of [[backpropagation]].<ref name="aima" />{{rp|25}}

Machine learning (ML), reorganised and recognised as its own field, started to flourish in the 1990s. The field changed its goal from achieving artificial intelligence to tackling solvable problems of a practical nature. It shifted focus away from the [[symbolic artificial intelligence|symbolic approaches]] it had inherited from AI, and toward methods and models borrowed from statistics, [[fuzzy logic]], and [[probability theory]].<ref name="changing" />

=== Data compression ===
{{excerpt|Data compression#Machine learning}}

=== Data mining===
Machine learning and [[data mining]] often employ the same methods and overlap significantly, but while machine learning focuses on prediction, based on ''known'' properties learned from the training data, data mining focuses on the [[discovery (observation)|discovery]] of (previously) ''unknown'' properties in the data (this is the analysis step of [[knowledge discovery]] in databases). Data mining uses many machine learning methods, but with different goals; on the other hand, machine learning also employs data mining methods as "[[unsupervised learning]]" or as a preprocessing step to improve learner accuracy. Much of the confusion between these two research communities (which do often have separate conferences and separate journals, [[ECML PKDD]] being a major exception) comes from the basic assumptions they work with: in machine learning, performance is usually evaluated with respect to the ability to ''reproduce known'' knowledge, while in knowledge discovery and data mining (KDD) the key task is the discovery of previously ''unknown'' knowledge. Evaluated with respect to known knowledge, an uninformed (unsupervised) method will easily be outperformed by other supervised methods, while in a typical KDD task, supervised methods cannot be used due to the unavailability of training data.

Machine learning also has intimate ties to [[optimisation]]: Many learning problems are formulated as minimisation of some [[loss function]] on a training set of examples. Loss functions express the discrepancy between the predictions of the model being trained and the actual problem instances (for example, in classification, one wants to assign a [[Labeled data|label]] to instances, and models are trained to correctly predict the preassigned labels of a set of examples).<ref>{{cite encyclopedia |last1=Le Roux |first1=Nicolas |first2=Yoshua |last2=Bengio |first3=Andrew |last3=Fitzgibbon |title=Improving First and Second-Order Methods by Modeling Uncertainty |encyclopedia=Optimization for Machine Learning |year=2012 |page=404 |editor1-last=Sra |editor1-first=Suvrit |editor2-first=Sebastian |editor2-last=Nowozin |editor3-first=Stephen J. |editor3-last=Wright |publisher=MIT Press |url=https://books.google.com/books?id=JPQx7s2L1A8C&q=%22Improving+First+and+Second-Order+Methods+by+Modeling+Uncertainty&pg=PA403 |isbn=9780262016469 |access-date=12 November 2020 |archive-date=17 January 2023 |archive-url=https://web.archive.org/web/20230117053335/https://books.google.com/books?id=JPQx7s2L1A8C&q=%22Improving+First+and+Second-Order+Methods+by+Modeling+Uncertainty&pg=PA403 |url-status=live }}</ref>

=== Generalization ===
Characterizing the generalisation of various learning algorithms is an active topic of current research, especially for [[deep learning]] algorithms.

=== Statistics ===
Machine learning and [[statistics]] are closely related fields in terms of methods, but distinct in their principal goal: statistics draws population [[Statistical inference|inferences]] from a [[Sample (statistics)|sample]], while machine learning finds generalisable predictive patterns.<ref>{{cite journal |first1=Danilo |last1=Bzdok |first2=Naomi |last2=Altman |author-link2=Naomi Altman |first3=Martin |last3=Krzywinski |title=Statistics versus Machine Learning |journal=[[Nature Methods]] |volume=15 |issue=4 |pages=233–234 |year=2018 |doi=10.1038/nmeth.4642 |pmid=30100822 |pmc=6082636 }}</ref> According to [[Michael I. Jordan]], the ideas of machine learning, from methodological principles to theoretical tools, have had a long pre-history in statistics.<ref name="mi jordan ama">{{cite web|url=https://www.reddit.com/r/MachineLearning/comments/2fxi6v/ama_michael_i_jordan/ckelmtt?context=3|title=statistics and machine learning|publisher=reddit|date=10 September 2014|access-date=1 October 2014|author=Michael I. Jordan|author-link=Michael I. Jordan|archive-date=18 October 2017|archive-url=https://web.archive.org/web/20171018192328/https://www.reddit.com/r/MachineLearning/comments/2fxi6v/ama_michael_i_jordan/ckelmtt/?context=3|url-status=live}}</ref> He also suggested the term [[data science]] as a placeholder to call the overall field.<ref name="mi jordan ama" />

Conventional statistical analyses require the a priori selection of a model most suitable for the study data set. In addition, only significant or theoretically relevant variables based on previous experience are included for analysis. In contrast, machine learning is not built on a pre-structured model; rather, the data shape the model by detecting underlying patterns. The more variables (input) used to train the model, the more accurate the ultimate model will be.<ref>Hung et al. Algorithms to Measure Surgeon Performance and Anticipate Clinical Outcomes in Robotic Surgery. JAMA Surg. 2018</ref>

[[Leo Breiman]] distinguished two statistical modelling paradigms: data model and algorithmic model,<ref name="Cornell-University-Library-2001">{{cite journal|url=http://projecteuclid.org/download/pdf_1/euclid.ss/1009213726|title=Breiman: Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author)|author=Cornell University Library|journal=Statistical Science|date=August 2001|volume=16|issue=3|doi=10.1214/ss/1009213726|s2cid=62729017|access-date=8 August 2015|archive-date=26 June 2017|archive-url=https://web.archive.org/web/20170626042637/http://projecteuclid.org/download/pdf_1/euclid.ss/1009213726|url-status=live|doi-access=free}}</ref> wherein "algorithmic model" means more or less the machine learning algorithms like [[Random forest|Random Forest]].

Some statisticians have adopted methods from machine learning, leading to a combined field that they call ''statistical learning''.<ref name="islr">{{cite book |author1=Gareth James |author2=Daniela Witten |author3=Trevor Hastie |author4=Robert Tibshirani |title=An Introduction to Statistical Learning |publisher=Springer |year=2013 |url=http://www-bcf.usc.edu/~gareth/ISL/ |page=vii |access-date=25 October 2014 |archive-date=23 June 2019 |archive-url=https://web.archive.org/web/20190623150237/http://www-bcf.usc.edu/~gareth/ISL/ |url-status=live }}</ref>

===Statistical physics===
Analytical and computational techniques derived from deep-rooted physics of disordered systems can be extended to large-scale problems, including machine learning, e.g., to analyse the weight space of [[deep neural network]]s.<ref name=SP_1>{{cite journal| author1=Ramezanpour, A.| author2=Beam, A.L.| author3=Chen, J.H.| author4=Mashaghi, A.| title=Statistical Physics for Medical Diagnostics: Learning, Inference, and Optimization Algorithms| journal=Diagnostics| date=17 November 2020| volume=10| issue=11| page=972| doi=10.3390/diagnostics10110972| doi-access=free| pmid=33228143| pmc=7699346}}</ref> Statistical physics is thus finding applications in the area of [[medical diagnostics]].<ref name=SP_2>{{cite journal| title=Statistical physics of medical diagnostics: Study of a probabilistic model| author1=Mashaghi, A.| author2=Ramezanpour, A. | journal=[[Physical Review E]]| volume=97| date=16 March 2018| issue=3–1| page=032118| doi=10.1103/PhysRevE.97.032118| pmid=29776109| arxiv=1803.10019| bibcode=2018PhRvE..97c2118M| s2cid=4955393}}</ref>