Editing Machine learning (section)

== Hardware ==
Since the 2010s, advances in both machine learning algorithms and computer hardware have led to more efficient methods for training [[deep neural network]]s (a particular narrow subdomain of machine learning) that contain many layers of nonlinear hidden units.<ref>{{cite web|last1=Research|first1=AI|title=Deep Neural Networks for Acoustic Modeling in Speech Recognition|url=http://airesearch.com/ai-research-papers/deep-neural-networks-for-acoustic-modeling-in-speech-recognition/|website=airesearch.com|access-date=23 October 2015|date=23 October 2015|archive-date=1 February 2016|archive-url=https://web.archive.org/web/20160201033801/http://airesearch.com/ai-research-papers/deep-neural-networks-for-acoustic-modeling-in-speech-recognition/|url-status=live}}</ref> By 2019, graphics processing units ([[GPU]]s), often with AI-specific enhancements, had displaced CPUs as the dominant method of training large-scale commercial cloud AI.<ref>{{cite news |title=GPUs Continue to Dominate the AI Accelerator Market for Now |url=https://www.informationweek.com/big-data/ai-machine-learning/gpus-continue-to-dominate-the-ai-accelerator-market-for-now/a/d-id/1336475 |access-date=11 June 2020 |work=InformationWeek |date=December 2019 |language=en |archive-date=10 June 2020 |archive-url=https://web.archive.org/web/20200610094310/https://www.informationweek.com/big-data/ai-machine-learning/gpus-continue-to-dominate-the-ai-accelerator-market-for-now/a/d-id/1336475 |url-status=live }}</ref> [[OpenAI]] estimated the hardware compute used in the largest deep learning projects from [[AlexNet]] (2012) to [[AlphaZero]] (2017), and found a 300,000-fold increase in the amount of compute required, with a doubling-time trendline of 3.4 months.<ref>{{cite news |last1=Ray |first1=Tiernan |title=AI is changing the entire nature of compute |url=https://www.zdnet.com/article/ai-is-changing-the-entire-nature-of-compute/ |access-date=11 June 2020 |work=ZDNet |date=2019 |language=en |archive-date=25 May 2020 |archive-url=https://web.archive.org/web/20200525144635/https://www.zdnet.com/article/ai-is-changing-the-entire-nature-of-compute/ |url-status=live }}</ref><ref>{{cite web |title=AI and Compute |url=https://openai.com/blog/ai-and-compute/ |website=OpenAI |access-date=11 June 2020 |language=en |date=16 May 2018 |archive-date=17 June 2020 |archive-url=https://web.archive.org/web/20200617200602/https://openai.com/blog/ai-and-compute/ |url-status=live }}</ref>

=== Tensor Processing Units (TPUs) ===
[[Tensor Processing Unit|Tensor Processing Units (TPUs)]] are specialised hardware accelerators developed by [[Google]] specifically for machine learning workloads. Unlike general-purpose [[Graphics processing unit|GPUs]] and [[Field-programmable gate array|FPGAs]], TPUs are optimised for tensor computations, making them particularly efficient for deep learning tasks such as training and inference. They are widely used in Google Cloud AI services and large-scale machine learning models like Google's DeepMind AlphaFold and large language models. TPUs leverage matrix multiplication units and high-bandwidth memory to accelerate computations while maintaining energy efficiency.<ref>{{Cite book |last1=Jouppi |first1=Norman P. |last2=Young |first2=Cliff |last3=Patil |first3=Nishant |last4=Patterson |first4=David |last5=Agrawal |first5=Gaurav |last6=Bajwa |first6=Raminder |last7=Bates |first7=Sarah |last8=Bhatia |first8=Suresh |last9=Boden |first9=Nan |last10=Borchers |first10=Al |last11=Boyle |first11=Rick |last12=Cantin |first12=Pierre-luc |last13=Chao |first13=Clifford |last14=Clark |first14=Chris |last15=Coriell |first15=Jeremy |chapter=In-Datacenter Performance Analysis of a Tensor Processing Unit |date=24 June 2017 |title=Proceedings of the 44th Annual International Symposium on Computer Architecture |chapter-url=https://dl.acm.org/doi/10.1145/3079856.3080246 |series=ISCA '17 |location=New York, NY, USA |publisher=Association for Computing Machinery |pages=1–12 |doi=10.1145/3079856.3080246 |isbn=978-1-4503-4892-8|arxiv=1704.04760 }}</ref> Since their introduction in 2016, TPUs have become a key component of AI infrastructure, especially in cloud-based environments.

===Neuromorphic computing===

[[Neuromorphic computing]] refers to a class of computing systems designed to emulate the structure and functionality of biological neural networks. These systems may be implemented through software-based simulations on conventional hardware or through specialised hardware architectures.<ref>{{Cite web |date=8 December 2020 |title=What is neuromorphic computing? Everything you need to know about how it is changing the future of computing |url=https://www.zdnet.com/article/what-is-neuromorphic-computing-everything-you-need-to-know-about-how-it-will-change-the-future-of-computing/ |access-date=21 November 2024 |website=ZDNET |language=en}}</ref>

==== physical neural networks ====
A [[physical neural network]] is a specific type of neuromorphic hardware that relies on electrically adjustable materials, such as memristors, to emulate the function of [[chemical synapse|neural synapses]]. The term "physical neural network" highlights the use of physical hardware for computation, as opposed to software-based implementations. It broadly refers to artificial neural networks that use materials with adjustable resistance to replicate neural synapses.<ref>{{Cite web |date=27 May 2021 |title=Cornell & NTT's Physical Neural Networks: A "Radical Alternative for Implementing Deep Neural Networks" That Enables Arbitrary Physical Systems Training |url=https://syncedreview.com/2021/05/27/deepmind-podracer-tpu-based-rl-frameworks-deliver-exceptional-performance-at-low-cost-28/ |url-status=live |archive-url=https://web.archive.org/web/20211027183428/https://syncedreview.com/2021/05/27/deepmind-podracer-tpu-based-rl-frameworks-deliver-exceptional-performance-at-low-cost-28/ |archive-date=27 October 2021 |access-date=12 October 2021 |website=Synced}}</ref><ref>{{Cite news |date=5 October 2021 |title=Nano-spaghetti to solve neural network power consumption |url=https://www.theregister.com/2021/10/05/analogue_neural_network_research/ |url-status=live |archive-url=https://web.archive.org/web/20211006150057/https://www.theregister.com/2021/10/05/analogue_neural_network_research/ |archive-date=6 October 2021 |access-date=12 October 2021 |work=The Register}}</ref>

===Embedded machine learning===
Embedded machine learning is a sub-field of machine learning where models are deployed on [[embedded systems]] with limited computing resources, such as [[wearable computer]]s, [[edge device]]s and [[microcontrollers]].<ref>{{Cite book|last1=Fafoutis|first1=Xenofon|last2=Marchegiani|first2=Letizia|last3=Elsts|first3=Atis|last4=Pope|first4=James|last5=Piechocki|first5=Robert|last6=Craddock|first6=Ian|title=2018 IEEE 4th World Forum on Internet of Things (WF-IoT) |chapter=Extending the battery lifetime of wearable sensors with embedded machine learning |date=7 May 2018|chapter-url=https://ieeexplore.ieee.org/document/8355116|pages=269–274|doi=10.1109/WF-IoT.2018.8355116|hdl=1983/b8fdb58b-7114-45c6-82e4-4ab239c1327f|isbn=978-1-4673-9944-9|s2cid=19192912|url=https://research-information.bris.ac.uk/en/publications/b8fdb58b-7114-45c6-82e4-4ab239c1327f |access-date=17 January 2022|archive-date=18 January 2022|archive-url=https://web.archive.org/web/20220118182543/https://ieeexplore.ieee.org/abstract/document/8355116?casa_token=LCpUeGLS1e8AAAAA:2OjuJfNwZBnV2pgDxfnEAC-jbrETv_BpTcX35_aFqN6IULFxu1xbYbVSRpD-zMd4GCUMELyG|url-status=live}}</ref><ref>{{Cite web|date=2 June 2021|title=A Beginner's Guide To Machine learning For Embedded Systems|url=https://analyticsindiamag.com/a-beginners-guide-to-machine-learning-for-embedded-systems/|access-date=17 January 2022|website=Analytics India Magazine|language=en-US|archive-date=18 January 2022|archive-url=https://web.archive.org/web/20220118182754/https://analyticsindiamag.com/a-beginners-guide-to-machine-learning-for-embedded-systems/|url-status=live}}</ref><ref>{{Cite web|last=Synced|date=12 January 2022|title=Google, Purdue & Harvard U's Open-Source Framework for TinyML Achieves up to 75x Speedups on FPGAs {{!}} Synced|url=https://syncedreview.com/2022/01/12/deepmind-podracer-tpu-based-rl-frameworks-deliver-exceptional-performance-at-low-cost-183/|access-date=17 January 2022|website=syncedreview.com|language=en-US|archive-date=18 January 2022|archive-url=https://web.archive.org/web/20220118182404/https://syncedreview.com/2022/01/12/deepmind-podracer-tpu-based-rl-frameworks-deliver-exceptional-performance-at-low-cost-183/|url-status=live}}</ref><ref>{{Cite journal
| last1 = AlSelek
| first1 = Mohammad
| last2 = Alcaraz-Calero
| first2 = Jose M.
| last3 = Wang
| first3 = Qi
| year = 2024
| title = Dynamic AI-IoT: Enabling Updatable AI Models in Ultralow-Power 5G IoT Devices
| journal = IEEE Internet of Things Journal
| volume = 11
| issue = 8
| pages = 14192–14205
| doi = 10.1109/JIOT.2023.3340858
| url = https://research-portal.uws.ac.uk/en/publications/c8edfe21-77d0-4c3e-a8bc-d384faf605a0
}}</ref> Running models directly on these devices eliminates the need to transfer and store data on cloud servers for further processing, thereby reducing the risk of data breaches, privacy leaks and theft of intellectual property, personal data and business secrets. Embedded machine learning can be achieved through various techniques, such as [[hardware acceleration]],<ref>{{Cite book|last1=Giri|first1=Davide|last2=Chiu|first2=Kuan-Lin|last3=Di Guglielmo|first3=Giuseppe|last4=Mantovani|first4=Paolo|last5=Carloni|first5=Luca P.|title=2020 Design, Automation & Test in Europe Conference & Exhibition (DATE) |chapter=ESP4ML: Platform-Based Design of Systems-on-Chip for Embedded Machine Learning |date=15 June 2020|chapter-url=https://ieeexplore.ieee.org/document/9116317|pages=1049–1054|doi=10.23919/DATE48585.2020.9116317|arxiv=2004.03640|isbn=978-3-9819263-4-7|s2cid=210928161|access-date=17 January 2022|archive-date=18 January 2022|archive-url=https://web.archive.org/web/20220118182342/https://ieeexplore.ieee.org/abstract/document/9116317?casa_token=5I_Tmgrrbu4AAAAA:v7pDHPEWlRuo2Vk3pU06194PO0-W21UOdyZqADrZxrRdPBZDMLwQrjJSAHUhHtzJmLu_VdgW|url-status=live}}</ref><ref>{{Cite web|last1=Louis|first1=Marcia Sahaya|last2=Azad|first2=Zahra|last3=Delshadtehrani|first3=Leila|last4=Gupta|first4=Suyog|last5=Warden|first5=Pete|last6=Reddi|first6=Vijay Janapa|last7=Joshi|first7=Ajay|date=2019|title=Towards Deep Learning using TensorFlow Lite on RISC-V|url=https://edge.seas.harvard.edu/publications/towards-deep-learning-using-tensorflow-lite-risc-v|access-date=17 January 2022|website=[[Harvard University]]|archive-date=17 January 2022|archive-url=https://web.archive.org/web/20220117031909/https://edge.seas.harvard.edu/publications/towards-deep-learning-using-tensorflow-lite-risc-v|url-status=live}}</ref> [[approximate computing]],<ref>{{Cite book|last1=Ibrahim|first1=Ali|last2=Osta|first2=Mario|last3=Alameh|first3=Mohamad|last4=Saleh|first4=Moustafa|last5=Chible|first5=Hussein|last6=Valle|first6=Maurizio|title=2018 25th IEEE International Conference on Electronics, Circuits and Systems (ICECS) |chapter=Approximate Computing Methods for Embedded Machine Learning |date=21 January 2019|chapter-url=https://ieeexplore.ieee.org/document/8617877|pages=845–848|doi=10.1109/ICECS.2018.8617877|isbn=978-1-5386-9562-3|s2cid=58670712|access-date=17 January 2022|archive-date=17 January 2022|archive-url=https://web.archive.org/web/20220117031855/https://ieeexplore.ieee.org/abstract/document/8617877?casa_token=arUW5Oy-tzwAAAAA:I9x6edlfskM6kGNFUN9zAFrjEBv_8kYTz7ERTxtXu9jAqdrYCcDbbwjBdgwXvb6QAH_-0VJJ|url-status=live}}</ref> and model optimisation.<ref>{{Cite web|title=dblp: TensorFlow Eager: A Multi-Stage, Python-Embedded DSL for Machine Learning.|url=https://dblp.org/rec/journals/corr/abs-1903-01855.html|access-date=17 January 2022|website=dblp.org|language=en|archive-date=18 January 2022|archive-url=https://web.archive.org/web/20220118182335/https://dblp.org/rec/journals/corr/abs-1903-01855.html|url-status=live}}</ref><ref>{{Cite journal|last1=Branco|first1=Sérgio|last2=Ferreira|first2=André G.|last3=Cabral|first3=Jorge|date=5 November 2019|title=Machine Learning in Resource-Scarce Embedded Systems, FPGAs, and End-Devices: A Survey|journal=Electronics|volume=8|issue=11|pages=1289|doi=10.3390/electronics8111289|issn=2079-9292|doi-access=free|hdl=1822/62521|hdl-access=free}}</ref> Common optimisation techniques include [[Pruning (artificial neural network)|pruning]], [[Quantization (Embedded Machine Learning)|quantisation]], [[knowledge distillation]], low-rank factorisation, network architecture search, and parameter sharing.