Editing Machine learning (section)

== Models ==
A '''{{vanchor|machine learning model}}''' is a type of [[mathematical model]] that, once "trained" on a given dataset, can be used to make predictions or classifications on new data. During training, a learning algorithm iteratively adjusts the model's internal parameters to minimise errors in its predictions.<ref>{{Cite book |last=Burkov |first=Andriy |title=The hundred-page machine learning book |date=2019 |publisher=Andriy Burkov |isbn=978-1-9995795-0-0 |location=Polen}}</ref> By extension, the term "model" can refer to several levels of specificity, from a general class of models and their associated learning algorithms to a fully trained model with all its internal parameters tuned.<ref>{{Cite book |last1=Russell |first1=Stuart J. |title=Artificial intelligence: a modern approach |last2=Norvig |first2=Peter |date=2021 |publisher=Pearson |isbn=978-0-13-461099-3 |edition=Fourth |series=Pearson series in artificial intelligence |location=Hoboken}}</ref>

Various types of models have been used and researched for machine learning systems, picking the best model for a task is called [[model selection]].

=== Artificial neural networks ===
{{Main|Artificial neural network}}{{See also|Deep learning}}
[[File:Colored neural network.svg|thumb|300px|An artificial neural network is an interconnected group of nodes, akin to the vast network of [[neuron]]s in a [[brain]]. Here, each circular node represents an [[artificial neuron]] and an arrow represents a connection from the output of one artificial neuron to the input of another.]]

Artificial neural networks (ANNs), or [[Connectionism|connectionist]] systems, are computing systems vaguely inspired by the [[biological neural network]]s that constitute animal [[brain]]s. Such systems "learn" to perform tasks by considering examples, generally without being programmed with any task-specific rules.

An ANN is a model based on a collection of connected units or nodes called "[[artificial neuron]]s", which loosely model the [[neuron]]s in a biological brain. Each connection, like the [[synapse]]s in a biological brain, can transmit information, a "signal", from one artificial neuron to another. An artificial neuron that receives a signal can process it and then signal additional artificial neurons connected to it. In common ANN implementations, the signal at a connection between artificial neurons is a [[real number]], and the output of each artificial neuron is computed by some non-linear function of the sum of its inputs. The connections between artificial neurons are called "edges". Artificial neurons and edges typically have a [[weight (mathematics)|weight]] that adjusts as learning proceeds. The weight increases or decreases the strength of the signal at a connection. Artificial neurons may have a threshold such that the signal is only sent if the aggregate signal crosses that threshold. Typically, artificial neurons are aggregated into layers. Different layers may perform different kinds of transformations on their inputs. Signals travel from the first layer (the input layer) to the last layer (the output layer), possibly after traversing the layers multiple times.

The original goal of the ANN approach was to solve problems in the same way that a [[human brain]] would. However, over time, attention moved to performing specific tasks, leading to deviations from [[biology]]. Artificial neural networks have been used on a variety of tasks, including [[computer vision]], [[speech recognition]], [[machine translation]], [[social network]] filtering, [[general game playing|playing board and video games]] and [[medical diagnosis]].

[[Deep learning]] consists of multiple hidden layers in an artificial neural network. This approach tries to model the way the human brain processes light and sound into vision and hearing. Some successful applications of deep learning are computer vision and speech recognition.<ref>Honglak Lee, Roger Grosse, Rajesh Ranganath, Andrew Y. Ng. "[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.149.802&rep=rep1&type=pdf Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations] {{Webarchive|url=https://web.archive.org/web/20171018182235/http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.149.802&rep=rep1&type=pdf |date=2017-10-18 }}" Proceedings of the 26th Annual International Conference on Machine Learning, 2009.</ref>

=== Decision trees ===
{{Main|Decision tree learning}}
[[File:Decision Tree.jpg|thumb|A decision tree showing survival probability of passengers on the [[Titanic]]]]

Decision tree learning uses a [[decision tree]] as a [[Predictive modeling|predictive model]] to go from observations about an item (represented in the branches) to conclusions about the item's target value (represented in the leaves). It is one of the predictive modelling approaches used in statistics, data mining, and machine learning. Tree models where the target variable can take a discrete set of values are called classification trees; in these tree structures, [[leaf node|leaves]] represent class labels, and branches represent [[Logical conjunction|conjunction]]s of features that lead to those class labels. Decision trees where the target variable can take continuous values (typically [[real numbers]]) are called regression trees. In decision analysis, a decision tree can be used to visually and explicitly represent decisions and [[decision making]]. In data mining, a decision tree describes data, but the resulting classification tree can be an input for decision-making.

=== Random forest regression ===
Random forest regression (RFR) falls under umbrella of decision [[tree-based models]]. RFR is an ensemble learning method that builds multiple decision trees and averages their predictions to improve accuracy and to avoid overfitting.  To build decision trees, RFR uses bootstrapped sampling, for instance each decision tree is trained on random data of from training set. This random selection of RFR for training enables model to reduce bias predictions and achieve accuracy. RFR generates independent decision trees, and it can work on single output data as well multiple regressor task. This makes RFR compatible to be used in various application.<ref>{{Cite web |title=RandomForestRegressor |url=https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html |access-date=12 February 2025 |website=scikit-learn |language=en}}</ref><ref>{{Cite web |date=20 October 2021 |title=What Is Random Forest? {{!}} IBM |url=https://www.ibm.com/think/topics/random-forest |access-date=12 February 2025 |website=www.ibm.com |language=en}}</ref>

=== Support-vector machines ===
{{Main|Support-vector machine}}
Support-vector machines (SVMs), also known as support-vector networks, are a set of related [[supervised learning]] methods used for classification and regression. Given a set of training examples, each marked as belonging to one of two categories, an SVM training algorithm builds a model that predicts whether a new example falls into one category.<ref name="CorinnaCortes">{{Cite journal |last1=Cortes |first1=Corinna |author-link1=Corinna Cortes |last2=Vapnik |first2=Vladimir N. |year=1995 |title=Support-vector networks |journal=[[Machine Learning (journal)|Machine Learning]] |volume=20 |issue=3 |pages=273–297 |doi=10.1007/BF00994018 |doi-access=free }}</ref> An SVM training algorithm is a non-[[probabilistic classification|probabilistic]], [[binary classifier|binary]], [[linear classifier]], although methods such as [[Platt scaling]] exist to use SVM in a probabilistic classification setting. In addition to performing linear classification, SVMs can efficiently perform a non-linear classification using what is called the [[kernel trick]], implicitly mapping their inputs into high-dimensional feature spaces.

=== Regression analysis ===
{{Main|Regression analysis}}
[[Image:Linear regression.svg|thumb|upright=1.3|Illustration of linear regression on a data set]]

Regression analysis encompasses a large variety of statistical methods to estimate the relationship between input variables and their associated features. Its most common form is [[linear regression]], where a single line is drawn to best fit the given data according to a mathematical criterion such as [[ordinary least squares]]. The latter is often extended by [[regularization (mathematics)|regularisation]] methods to mitigate overfitting and bias, as in [[ridge regression]]. When dealing with non-linear problems, go-to models include [[polynomial regression]] (for example, used for trendline fitting in Microsoft Excel<ref>{{cite web|last1=Stevenson|first1=Christopher|title=Tutorial: Polynomial Regression in Excel|url=https://facultystaff.richmond.edu/~cstevens/301/Excel4.html|website=facultystaff.richmond.edu|access-date=22 January 2017|archive-date=2 June 2013|archive-url=https://web.archive.org/web/20130602200850/https://facultystaff.richmond.edu/~cstevens/301/Excel4.html|url-status=live}}</ref>), [[logistic regression]] (often used in [[statistical classification]]) or even [[kernel regression]], which introduces non-linearity by taking advantage of the [[kernel trick]] to implicitly map input variables to higher-dimensional space.

[[General linear model|Multivariate linear regression]] extends the concept of linear regression to handle multiple dependent variables simultaneously. This approach estimates the relationships between a set of input variables and several output variables by fitting a [[Multidimensional system|multidimensional]] linear model. It is particularly useful in scenarios where outputs are interdependent or share underlying patterns, such as predicting multiple economic indicators or reconstructing images,<ref>{{cite journal |last1= Wanta |first1= Damian |last2= Smolik |first2= Aleksander |last3= Smolik |first3= Waldemar T. |last4= Midura |first4= Mateusz |last5= Wróblewski |first5= Przemysław  |date= 2025 |title= Image reconstruction using machine-learned pseudoinverse in electrical capacitance tomography  |journal= Engineering Applications of Artificial Intelligence |volume= 142|page= 109888|doi= 10.1016/j.engappai.2024.109888 |doi-access= free}}</ref> which are inherently multi-dimensional.

=== Bayesian networks ===
{{Main|Bayesian network}}
[[Image:SimpleBayesNetNodes.svg|thumb|right|A simple Bayesian network. Rain influences whether the sprinkler is activated, and both rain and the sprinkler influence whether the grass is wet.]]

A Bayesian network, belief network, or directed acyclic graphical model is a probabilistic [[graphical model]] that represents a set of [[random variables]] and their [[conditional independence]] with a [[directed acyclic graph]] (DAG). For example, a Bayesian network could represent the probabilistic relationships between diseases and symptoms. Given symptoms, the network can be used to compute the probabilities of the presence of various diseases. Efficient algorithms exist that perform [[Bayesian inference|inference]] and learning. Bayesian networks that model sequences of variables, like [[speech recognition|speech signals]] or [[peptide sequence|protein sequences]], are called [[dynamic Bayesian network]]s. Generalisations of Bayesian networks that can represent and solve decision problems under uncertainty are called [[influence diagram]]s.

=== Gaussian processes ===
{{Main|Gaussian processes}}
[[Image:Regressions sine demo.svg|thumbnail|right|An example of Gaussian Process Regression (prediction) compared with other regression models<ref>The documentation for [[scikit-learn]] also has similar [http://scikit-learn.org/stable/auto_examples/gaussian_process/plot_compare_gpr_krr.html examples] {{Webarchive|url=https://web.archive.org/web/20221102184805/https://scikit-learn.org/stable/auto_examples/gaussian_process/plot_compare_gpr_krr.html |date=2 November 2022 }}.</ref>]]

A Gaussian process is a [[stochastic process]] in which every finite collection of the random variables in the process has a [[multivariate normal distribution]], and it relies on a pre-defined [[covariance function]], or kernel, that models how pairs of points relate to each other depending on their locations.

Given a set of observed points, or input–output examples, the distribution of the (unobserved) output of a new point as function of its input data can be directly computed by looking like the observed points and the covariances between those points and the new, unobserved point.

Gaussian processes are popular surrogate models in [[Bayesian optimisation]] used to do [[hyperparameter optimisation]].

=== Genetic algorithms ===
{{Main|Genetic algorithm}}
A genetic algorithm (GA) is a [[search algorithm]] and [[heuristic (computer science)|heuristic]] technique that mimics the process of [[natural selection]], using methods such as [[Mutation (genetic algorithm)|mutation]] and [[Crossover (genetic algorithm)|crossover]] to generate new [[Chromosome (genetic algorithm)|genotype]]s in the hope of finding good solutions to a given problem. In machine learning, genetic algorithms were used in the 1980s and 1990s.<ref>{{cite journal |last1=Goldberg |first1=David E. |first2=John H. |last2=Holland |title=Genetic algorithms and machine learning |journal=[[Machine Learning (journal)|Machine Learning]] |volume=3 |issue=2 |year=1988 |pages=95–99 |doi=10.1007/bf00113892 |s2cid=35506513 |url=https://deepblue.lib.umich.edu/bitstream/2027.42/46947/1/10994_2005_Article_422926.pdf |doi-access=free |access-date=3 September 2019 |archive-date=16 May 2011 |archive-url=https://web.archive.org/web/20110516025803/http://deepblue.lib.umich.edu/bitstream/2027.42/46947/1/10994_2005_Article_422926.pdf |url-status=live }}</ref><ref>{{Cite journal |title=Machine Learning, Neural and Statistical Classification |journal=Ellis Horwood Series in Artificial Intelligence |first1=D. |last1=Michie |first2=D. J. |last2=Spiegelhalter |first3=C. C. |last3=Taylor |year=1994 |bibcode=1994mlns.book.....M }}</ref> Conversely, machine learning techniques have been used to improve the performance of genetic and [[evolutionary algorithm]]s.<ref>{{cite journal |last1=Zhang |first1=Jun |last2=Zhan |first2=Zhi-hui |last3=Lin |first3=Ying |last4=Chen |first4=Ni |last5=Gong |first5=Yue-jiao |last6=Zhong |first6=Jing-hui |last7=Chung |first7=Henry S.H. |last8=Li |first8=Yun |last9=Shi |first9=Yu-hui |title=Evolutionary Computation Meets Machine Learning: A Survey |journal=Computational Intelligence Magazine |year=2011 |volume=6 |issue=4 |pages=68–75 |doi=10.1109/mci.2011.942584|s2cid=6760276 }}</ref>

=== Belief functions ===
{{Main|Dempster–Shafer theory}}
The theory of belief functions, also referred to as evidence theory or Dempster–Shafer theory, is a general framework for reasoning with uncertainty, with understood connections to other frameworks such as [[probability]], [[Possibility theory|possibility]] and  [[Imprecise probability|imprecise probability theories]]. These theoretical frameworks can be thought of as a kind of learner and have some analogous properties of how evidence is combined (e.g.,  Dempster's rule of combination), just like how in a [[Probability mass function|pmf]]-based Bayesian approach would combine probabilities.<ref>{{Cite journal |last1=Verbert |first1=K. |last2=Babuška |first2=R. |last3=De Schutter |first3=B. |date=2017-04-01 |title=Bayesian and Dempster–Shafer reasoning for knowledge-based fault diagnosis–A comparative study |url=https://www.sciencedirect.com/science/article/abs/pii/S0952197617300118 |journal=Engineering Applications of Artificial Intelligence |volume=60 |pages=136–150 |doi=10.1016/j.engappai.2017.01.011 |issn=0952-1976}}</ref> However, there are many caveats to these beliefs functions when compared to Bayesian approaches in order to incorporate ignorance and [[uncertainty quantification]]. These belief function approaches that are implemented within the machine learning domain typically leverage a fusion approach of various [[ensemble methods]] to better handle the learner's [[decision boundary]], low samples, and ambiguous class issues that standard machine learning approach tend to have difficulty resolving.<ref name="YoosefzadehNajafabadi-2021" /><ref name="Kohavi" /> However, the computational complexity of these algorithms are dependent on the number of propositions (classes), and can lead to a much higher computation time when compared to other machine learning approaches.

=== Rule-based models ===
{{Main|Rule-based machine learning}}
Rule-based machine learning (RBML) is a branch of machine learning that automatically discovers and learns 'rules' from data. It provides interpretable models, making it useful for decision-making in fields like healthcare, fraud detection, and cybersecurity. Key RBML techniques includes [[learning classifier system]]s,<ref>{{Cite journal |last1=Urbanowicz |first1=Ryan J. |last2=Moore |first2=Jason H. |date=22 September 2009 |title=Learning Classifier Systems: A Complete Introduction, Review, and Roadmap |journal=Journal of Artificial Evolution and Applications |language=en |volume=2009 |pages=1–25 |doi=10.1155/2009/736398 |issn=1687-6229 |doi-access=free }}</ref> [[association rule learning]],<ref>Zhang, C. and Zhang, S., 2002. ''[https://books.google.com/books?id=VqSoCAAAQBAJ Association rule mining: models and algorithms]''. Springer-Verlag.</ref> [[artificial immune system]]s,<ref>De Castro, Leandro Nunes, and Jonathan Timmis. ''[https://books.google.com/books?id=aMFP7p8DtaQC&q=%22rule-based%22 Artificial immune systems: a new computational intelligence approach]''. Springer Science & Business Media, 2002.</ref> and other similar models. These methods extract patterns from data and evolve rules over time.

=== Training models ===
Typically, machine learning models require a high quantity of reliable data to perform accurate predictions. When training a machine learning model, machine learning engineers need to target and collect a large and representative [[Sample (statistics)|sample]] of data. Data from the training set can be as varied as a [[corpus of text]], a collection of images, [[sensor]] data, and data collected from individual users of a service. [[Overfitting]] is something to watch out for when training a machine learning model. Trained models derived from biased or non-evaluated data can result in skewed or undesired predictions. Biased models may result in detrimental outcomes, thereby furthering the negative impacts on society or objectives. [[Algorithmic bias]] is a potential result of data not being fully prepared for training. Machine learning ethics is becoming a field of study and notably, becoming integrated within machine learning engineering teams.

==== Federated learning ====
{{Main|Federated learning}}
Federated learning is an adapted form of [[distributed artificial intelligence]] to training machine learning models that decentralises the training process, allowing for users' privacy to be maintained by not needing to send their data to a centralised server. This also increases efficiency by decentralising the training process to many devices. For example, [[Gboard]] uses federated machine learning to train search query prediction models on users' mobile phones without having to send individual searches back to [[Google]].<ref>{{Cite web|url=http://ai.googleblog.com/2017/04/federated-learning-collaborative.html|title=Federated Learning: Collaborative Machine Learning without Centralized Training Data|website=Google AI Blog|date=6 April 2017 |language=en|access-date=8 June 2019|archive-date=7 June 2019|archive-url=https://web.archive.org/web/20190607054623/https://ai.googleblog.com/2017/04/federated-learning-collaborative.html|url-status=live}}</ref>