Editing Artificial intelligence (section)

== Techniques ==
AI research uses a wide variety of techniques to accomplish the goals above.{{Efn|name="Tools of AI"|This list of tools is based on the topics covered by the major AI textbooks, including: {{Harvtxt|Russell|Norvig|2021}}, {{Harvtxt|Luger|Stubblefield|2004}}, {{Harvtxt|Poole|Mackworth|Goebel|1998}} and {{Harvtxt|Nilsson|1998}}}}

=== Search and optimization ===
AI can solve many problems by intelligently searching through many possible solutions.<ref>[[Search algorithm]]s: {{Harvtxt|Russell|Norvig|2021|loc=chpts. 3–5}}, {{Harvtxt|Poole|Mackworth|Goebel|1998|pp=113–163}}, {{Harvtxt|Luger|Stubblefield|2004|pp=79–164, 193–219}}, {{Harvtxt|Nilsson|1998|loc=chpts. 7–12}}</ref> There are two very different kinds of search used in AI: [[state space search]] and [[Local search (optimization)|local search]].

==== State space search ====
[[State space search]] searches through a tree of possible states to try to find a goal state.<ref>[[State space search]]: {{Harvtxt|Russell|Norvig|2021|loc=chpt. 3}}</ref> For example, [[Automated planning and scheduling|planning]] algorithms search through trees of goals and subgoals, attempting to find a path to a target goal, a process called [[means-ends analysis]].{{Sfnp|Russell|Norvig|2021|loc=sect. 11.2}}

[[Brute force search|Simple exhaustive searches]]<ref>[[Uninformed search]]es ([[breadth first search]], [[depth-first search]] and general [[state space search]]): {{Harvtxt|Russell|Norvig|2021|loc=sect. 3.4}}, {{Harvtxt|Poole|Mackworth|Goebel|1998|pp=113–132}}, {{Harvtxt|Luger|Stubblefield|2004|pp=79–121}}, {{Harvtxt|Nilsson|1998|loc=chpt. 8}}</ref> are rarely sufficient for most real-world problems: the [[Search algorithm|search space]] (the number of places to search) quickly grows to [[Astronomically large|astronomical numbers]]. The result is a search that is [[Computation time|too slow]] or never completes.<ref name="Intractability and efficiency and the combinatorial explosion"/> "[[Heuristics]]" or "rules of thumb" can help prioritize choices that are more likely to reach a goal.<ref>[[Heuristic]] or informed searches (e.g., greedy [[Best-first search|best first]] and [[A* search algorithm|A*]]): {{Harvtxt|Russell|Norvig|2021|loc=sect. 3.5}}, {{Harvtxt|Poole|Mackworth|Goebel|1998|pp=132–147}}, {{Harvtxt|Poole|Mackworth|2017|loc=sect. 3.6}}, {{Harvtxt|Luger|Stubblefield|2004|pp=133–150}}</ref>

[[Adversarial search]] is used for [[game AI|game-playing]] programs, such as chess or Go. It searches through a [[Game tree|tree]] of possible moves and countermoves, looking for a winning position.<ref>[[Adversarial search]]: {{Harvtxt|Russell|Norvig|2021|loc=chpt. 5}}</ref>

==== Local search ====
[[File:Gradient descent.gif|class=skin-invert-image|thumb|Illustration of [[gradient descent]] for 3 different starting points; two parameters (represented by the plan coordinates) are adjusted in order to minimize the [[loss function]] (the height)]] [[Local search (optimization)|Local search]] uses [[mathematical optimization]] to find a solution to a problem. It begins with some form of guess and refines it incrementally.<ref>[[Local search (optimization)|Local]] or "[[optimization]]" search: {{Harvtxt|Russell|Norvig|2021|loc=chpt. 4}}</ref>

[[Gradient descent]] is a type of local search that optimizes a set of numerical parameters by incrementally adjusting them to minimize a [[loss function]]. Variants of gradient descent are commonly used to train [[Artificial neural network|neural networks]],<ref>{{Cite web |last=Singh Chauhan |first=Nagesh |date=December 18, 2020 |title=Optimization Algorithms in Neural Networks |url=https://www.kdnuggets.com/optimization-algorithms-in-neural-networks |access-date=2024-01-13 |website=KDnuggets}}</ref> through the [[backpropagation]] algorithm.

Another type of local search is [[evolutionary computation]], which aims to iteratively improve a set of candidate solutions by "mutating" and "recombining" them, [[Artificial selection|selecting]] only the fittest to survive each generation.<ref>[[Evolutionary computation]]: {{Harvtxt|Russell|Norvig|2021|loc=sect. 4.1.2}}</ref>

Distributed search processes can coordinate via [[swarm intelligence]] algorithms. Two popular swarm algorithms used in search are [[particle swarm optimization]] (inspired by bird [[flocking]]) and [[ant colony optimization]] (inspired by [[ant trail]]s).{{Sfnp|Merkle|Middendorf|2013}}

=== Logic ===
Formal [[logic]] is used for [[automatic reasoning|reasoning]] and [[knowledge representation]].<ref>[[Logic]]: {{Harvtxt|Russell|Norvig|2021|loc=chpts. 6–9}}, {{Harvtxt|Luger|Stubblefield|2004|pp=35–77}}, {{Harvtxt|Nilsson|1998|loc=chpt. 13–16}}</ref>
Formal logic comes in two main forms: [[propositional logic]] (which operates on statements that are true or false and uses [[logical connective]]s such as "and", "or", "not" and "implies")<ref>[[Propositional logic]]: {{Harvtxt|Russell|Norvig|2021|loc=chpt. 6}}, {{Harvtxt|Luger|Stubblefield|2004|pp=45–50}}, {{Harvtxt|Nilsson|1998|loc=chpt. 13}}</ref> and [[predicate logic]] (which also operates on objects, predicates and relations and uses [[Quantifier (logic)|quantifier]]s such as "''Every'' ''X'' is a ''Y''" and "There are ''some'' ''X''s that are ''Y''s").<ref>[[First-order logic]] and features such as [[Equality (mathematics)|equality]]: {{Harvtxt|Russell|Norvig|2021|loc=chpt. 7}}, {{Harvtxt|Poole|Mackworth|Goebel|1998|pp=268–275}}, {{Harvtxt|Luger|Stubblefield|2004|pp=50–62}}, {{Harvtxt|Nilsson|1998|loc=chpt. 15}}</ref>

[[Deductive reasoning]] in logic is the process of [[logical proof|proving]] a new statement ([[Logical consequence|conclusion]]) from other statements that are given and assumed to be true (the [[premise]]s).<ref>[[Logical inference]]: {{Harvtxt|Russell|Norvig|2021|loc=chpt. 10}}</ref> Proofs can be structured as proof [[tree structure|trees]], in which nodes are labelled by sentences, and children nodes are connected to parent nodes by [[inference rule]]s.

Given a problem and a set of premises, problem-solving reduces to searching for a proof tree whose root node is labelled by a solution of the problem and whose [[leaf nodes]] are labelled by premises or [[axiom]]s. In the case of [[Horn clause]]s, problem-solving search can be performed by reasoning [[Forward chaining|forwards]] from the premises or [[backward chaining|backwards]] from the problem.<ref>logical deduction as search: {{Harvtxt|Russell|Norvig|2021|loc=sects. 9.3, 9.4}}, {{Harvtxt|Poole|Mackworth|Goebel|1998|pp=~46–52}}, {{Harvtxt|Luger|Stubblefield|2004|pp=62–73}}, {{Harvtxt|Nilsson|1998|loc=chpt. 4.2, 7.2}}</ref> In the more general case of the clausal form of [[first-order logic]], [[resolution (logic)|resolution]] is a single, axiom-free rule of inference, in which a problem is solved by proving a contradiction from premises that include the negation of the problem to be solved.<ref>[[Resolution (logic)|Resolution]] and [[unification (computer science)|unification]]: {{Harvtxt|Russell|Norvig|2021|loc= sections 7.5.2, 9.2, 9.5}}</ref>

Inference in both Horn clause logic and first-order logic is [[Undecidable problem|undecidable]], and therefore [[Intractable problem|intractable]]. However, backward reasoning with Horn clauses, which underpins computation in the [[logic programming]] language [[Prolog]], is [[Turing complete]]. Moreover, its efficiency is competitive with computation in other [[symbolic programming]] languages.<ref>{{Cite journal |last1=Warren |first1=D.H. |last2=Pereira |first2=L.M. |last3=Pereira |first3=F. |date=1977 |title=Prolog-the language and its implementation compared with Lisp |journal=[[ACM SIGPLAN Notices]] |volume=12 |issue=8 |pages=109–115 |doi=10.1145/872734.806939}}</ref>

[[Fuzzy logic]] assigns a "degree of truth" between 0 and 1. It can therefore handle propositions that are vague and partially true.<ref>Fuzzy logic: {{Harvtxt|Russell|Norvig|2021|pp=214, 255, 459}}, {{Harvtxt|Scientific American|1999}}</ref>

[[Non-monotonic logic]]s, including logic programming with [[negation as failure]], are designed to handle [[default reasoning]].<ref name="Default reasoning"/> Other specialized versions of logic have been developed to describe many complex domains.

=== Probabilistic methods for uncertain reasoning ===
[[File:SimpleBayesNet.svg|class=skin-invert-image|thumb|upright=1.7|A simple [[Bayesian network]], with the associated [[conditional probability table]]s]]
Many problems in AI (including reasoning, planning, learning, perception, and robotics) require the agent to operate with incomplete or uncertain information. AI researchers have devised a number of tools to solve these problems using methods from [[probability]] theory and economics.<ref name="Stoch">Stochastic methods for uncertain reasoning: {{Harvtxt|Russell|Norvig|2021|loc=chpt. 12–18, 20}}, {{Harvtxt|Poole|Mackworth|Goebel|1998|pp=345–395}}, {{Harvtxt|Luger|Stubblefield|2004|pp=165–191, 333–381}}, {{Harvtxt|Nilsson|1998|loc=chpt. 19}}</ref> Precise mathematical tools have been developed that analyze how an agent can make choices and plan, using [[decision theory]], [[decision analysis]],<ref>[[decision theory]] and [[decision analysis]]: {{Harvtxt|Russell|Norvig|2021|loc=chpt. 16–18}}, {{Harvtxt|Poole|Mackworth|Goebel|1998|pp=381–394}}</ref> and [[information value theory]].<ref>[[Information value theory]]: {{Harvtxt|Russell|Norvig|2021|loc=sect. 16.6}}</ref> These tools include models such as [[Markov decision process]]es,<ref>[[Markov decision process]]es and dynamic [[decision network]]s: {{Harvtxt|Russell|Norvig|2021|loc=chpt. 17}}</ref> dynamic [[decision network]]s,<ref name="Stochastic temporal models"/> [[game theory]] and [[mechanism design]].<ref>[[Game theory]] and [[mechanism design]]: {{Harvtxt|Russell|Norvig|2021|loc=chpt. 18}}</ref>

[[Bayesian network]]s<ref>[[Bayesian network]]s: {{Harvtxt|Russell|Norvig|2021|loc=sects. 12.5–12.6, 13.4–13.5, 14.3–14.5, 16.5, 20.2–20.3}}, {{Harvtxt|Poole|Mackworth|Goebel|1998|pp=361–381}}, {{Harvtxt|Luger|Stubblefield|2004|pp=~182–190, ≈363–379}}, {{Harvtxt|Nilsson|1998|loc=chpt. 19.3–19.4}}</ref> are a tool that can be used for [[automated reasoning|reasoning]] (using the [[Bayesian inference]] algorithm),{{Efn|
Compared with symbolic logic, formal Bayesian inference is computationally expensive. For inference to be tractable, most observations must be [[conditionally independent]] of one another. [[AdSense]] uses a Bayesian network with over 300&nbsp;million edges to learn which ads to serve.{{Sfnp|Domingos|2015|loc=chpt. 6}}
}}<ref>[[Bayesian inference]] algorithm: {{Harvtxt|Russell|Norvig|2021|loc=sect. 13.3–13.5}}, {{Harvtxt|Poole|Mackworth|Goebel|1998|pp=361–381}}, {{Harvtxt|Luger|Stubblefield|2004|pp=~363–379}}, {{Harvtxt|Nilsson|1998|loc=chpt. 19.4 & 7}}</ref> [[Machine learning|learning]] (using the [[expectation–maximization algorithm]]),{{Efn|Expectation–maximization, one of the most popular algorithms in machine learning, allows clustering in the presence of unknown [[latent variables]].{{Sfnp|Domingos|2015|p=210}}}}<ref>[[Bayesian learning]] and the [[expectation–maximization algorithm]]: {{Harvtxt|Russell|Norvig|2021|loc=chpt. 20}}, {{Harvtxt|Poole|Mackworth|Goebel|1998|pp=424–433}}, {{Harvtxt|Nilsson|1998|loc=chpt. 20}}, {{Harvtxt|Domingos|2015|p=210}}</ref> [[Automated planning and scheduling|planning]] (using [[decision network]]s)<ref>[[Bayesian decision theory]] and Bayesian [[decision network]]s: {{Harvtxt|Russell|Norvig|2021|loc=sect. 16.5}}</ref> and [[Machine perception|perception]] (using [[dynamic Bayesian network]]s).<ref name="Stochastic temporal models"/>

Probabilistic algorithms can also be used for filtering, prediction, smoothing, and finding explanations for streams of data, thus helping perception systems analyze processes that occur over time (e.g., [[hidden Markov model]]s or [[Kalman filter]]s).<ref name="Stochastic temporal models">Stochastic temporal models: {{Harvtxt|Russell|Norvig|2021|loc=chpt. 14}}
[[Hidden Markov model]]: {{Harvtxt|Russell|Norvig|2021|loc=sect. 14.3}}
[[Kalman filter]]s: {{Harvtxt|Russell|Norvig|2021|loc=sect. 14.4}}
[[Dynamic Bayesian network]]s: {{Harvtxt|Russell|Norvig|2021|loc=sect. 14.5}}</ref>

[[File:EM_Clustering_of_Old_Faithful_data.gif|thumb|upright=1.2|[[Expectation–maximization algorithm|Expectation–maximization]] [[cluster analysis|clustering]] of [[Old Faithful]] eruption data starts from a random guess but then successfully converges on an accurate clustering of the two physically distinct modes of eruption.]]

=== Classifiers and statistical learning methods ===
The simplest AI applications can be divided into two types: classifiers (e.g., "if shiny then diamond"), on one hand, and controllers (e.g., "if diamond then pick up"), on the other hand. [[Classifier (mathematics)|Classifiers]]<ref>Statistical learning methods and [[Classifier (mathematics)|classifiers]]: {{Harvtxt|Russell|Norvig|2021|loc=chpt. 20}},</ref> are functions that use [[pattern matching]] to determine the closest match. They can be fine-tuned based on chosen examples using [[supervised learning]]. Each pattern (also called an "[[random variate|observation]]") is labeled with a certain predefined class. All the observations combined with their class labels are known as a [[data set]]. When a new observation is received, that observation is classified based on previous experience.<ref name="Supervised learning"/>

There are many kinds of classifiers in use.<ref>{{Cite book |last1=Ciaramella |first1=Alberto |author-link=Alberto Ciaramella |title=Introduction to Artificial Intelligence: from data analysis to generative AI |last2=Ciaramella |first2=Marco |date=2024 |publisher=Intellisemantic Editions |isbn=978-8-8947-8760-3}}</ref> The [[decision tree]] is the simplest and most widely used symbolic machine learning algorithm.<ref>[[Alternating decision tree|Decision tree]]s: {{Harvtxt|Russell|Norvig|2021|loc=sect. 19.3}}, {{Harvtxt|Domingos|2015|p=88}}</ref> [[K-nearest neighbor]] algorithm was the most widely used analogical AI until the mid-1990s, and [[Kernel methods]] such as the [[support vector machine]] (SVM) displaced k-nearest neighbor in the 1990s.<ref>[[Nonparametric statistics|Non-parameteric]] learning models such as [[K-nearest neighbor]] and [[support vector machines]]: {{Harvtxt|Russell|Norvig|2021|loc=sect. 19.7}}, {{Harvtxt|Domingos|2015|p=187}} (k-nearest neighbor)
* {{Harvtxt|Domingos|2015|p=88}} (kernel methods)</ref>
The [[naive Bayes classifier]] is reportedly the "most widely used learner"{{Sfnp|Domingos|2015|p=152}} at Google, due in part to its scalability.<ref>[[Naive Bayes classifier]]: {{Harvtxt|Russell|Norvig|2021|loc=sect. 12.6}}, {{Harvtxt|Domingos|2015|p=152}}</ref>
[[Artificial neural network|Neural networks]] are also used as classifiers.<ref name="Neural networks"/>

=== Artificial neural networks ===
[[File:Artificial_neural_network.svg|right|thumb|A neural network is an interconnected group of nodes, akin to the vast network of [[neuron]]s in the [[human brain]].]]

An artificial neural network is based on a collection of nodes also known as [[artificial neurons]], which loosely model the [[neurons]] in a biological brain. It is trained to recognise patterns; once trained, it can recognise those patterns in fresh data. There is an input, at least one hidden layer of nodes and an output. Each node applies a function and once the [[Weighting|weight]] crosses its specified threshold, the data is transmitted to the next layer. A network is typically called a deep neural network if it has at least 2 hidden layers.<ref name="Neural networks">Neural networks: {{Harvtxt|Russell|Norvig|2021|loc=chpt. 21}}, {{Harvtxt|Domingos|2015|loc=Chapter 4}}</ref>

Learning algorithms for neural networks use [[local search (optimization)|local search]] to choose the weights that will get the right output for each input during training. The most common training technique is the [[backpropagation]] algorithm.<ref>Gradient calculation in computational graphs, [[backpropagation]], [[automatic differentiation]]: {{Harvtxt|Russell|Norvig|2021|loc=sect. 21.2}}, {{Harvtxt|Luger|Stubblefield|2004|pp=467–474}}, {{Harvtxt|Nilsson|1998|loc=chpt. 3.3}}</ref> Neural networks learn to model complex relationships between inputs and outputs and [[Pattern recognition|find patterns]] in data. In theory, a neural network can learn any function.<ref>[[Universal approximation theorem]]: {{Harvtxt|Russell|Norvig|2021|p=752}}
The theorem: {{Harvtxt|Cybenko|1988}}, {{Harvtxt|Hornik|Stinchcombe|White|1989}}</ref>

In [[feedforward neural network]]s the signal passes in only one direction.<ref>[[Feedforward neural network]]s: {{Harvtxt|Russell|Norvig|2021|loc=sect. 21.1}}</ref> [[Recurrent neural network]]s feed the output signal back into the input, which allows short-term memories of previous input events. [[Long short term memory]] is the most successful network architecture for recurrent networks.<ref>[[Recurrent neural network]]s: {{Harvtxt|Russell|Norvig|2021|loc=sect. 21.6}}</ref> [[Perceptron]]s<ref>[[Perceptron]]s: {{Harvtxt|Russell|Norvig|2021|pp=21, 22, 683, 22}}</ref> use only a single layer of neurons; deep learning<ref name="Deep learning"/> uses multiple layers. [[Convolutional neural network]]s strengthen the connection between neurons that are "close" to each other—this is especially important in [[image processing]], where a local set of neurons must [[edge detection|identify an "edge"]] before the network can identify an object.<ref>[[Convolutional neural networks]]: {{Harvtxt|Russell|Norvig|2021|loc=sect. 21.3}}</ref>

{{Clear}}

=== Deep learning ===
[[File:AI hierarchy.svg|thumb|upright]]
[[Deep learning]]<ref name="Deep learning">[[Deep learning]]: {{Harvtxt|Russell|Norvig|2021|loc=chpt. 21}}, {{Harvtxt|Goodfellow|Bengio|Courville|2016}}, {{Harvtxt|Hinton ''et al.''|2016}}, {{Harvtxt|Schmidhuber|2015}}</ref> uses several layers of neurons between the network's inputs and outputs. The multiple layers can progressively extract higher-level features from the raw input. For example, in [[image processing]], lower layers may identify edges, while higher layers may identify the concepts relevant to a human such as digits, letters, or faces.{{Sfnp|Deng|Yu|2014|pp=199–200}}

Deep learning has profoundly improved the performance of programs in many important subfields of artificial intelligence, including [[computer vision]], [[speech recognition]], [[natural language processing]], [[image classification]],{{Sfnp|Ciresan|Meier|Schmidhuber|2012}} and others. The reason that deep learning performs so well in so many applications is not known as of 2021.{{Sfnp|Russell|Norvig|2021|p=750}} The sudden success of deep learning in 2012–2015 did not occur because of some new discovery or theoretical breakthrough (deep neural networks and backpropagation had been described by many people, as far back as the 1950s){{Efn|
Some form of deep neural networks (without a specific learning algorithm) were described by:
[[Warren S. McCulloch]] and [[Walter Pitts]] (1943){{Sfnp|Russell|Norvig|2021|p=17}}
[[Alan Turing]] (1948);{{Sfnp|Russell|Norvig|2021|p=785}}
[[Karl Steinbuch]] and [[Roger David Joseph]] (1961).{{Sfnp|Schmidhuber|2022|loc=sect. 5}}
Deep or recurrent networks that learned (or used gradient descent) were developed by:
[[Frank Rosenblatt]](1957);{{Sfnp|Russell|Norvig|2021|p=785}}
[[Oliver Selfridge]] (1959);{{Sfnp|Schmidhuber|2022|loc=sect. 5}}
[[Alexey Ivakhnenko]] and [[Valentin Lapa]] (1965);{{Sfnp|Schmidhuber|2022|loc=sect. 6}}
[[Kaoru Nakano]] (1971);{{Sfnp|Schmidhuber|2022|loc=sect. 7}}
[[Shun-Ichi Amari]] (1972);{{Sfnp|Schmidhuber|2022|loc=sect. 7}}
[[John Joseph Hopfield]] (1982).{{Sfnp|Schmidhuber|2022|loc=sect. 7}}
Precursors to backpropagation were developed by:
[[Henry J. Kelley]] (1960);{{Sfnp|Russell|Norvig|2021|p=785}}
[[Arthur E. Bryson]] (1962);{{Sfnp|Russell|Norvig|2021|p=785}}
[[Stuart Dreyfus]] (1962);{{Sfnp|Russell|Norvig|2021|p=785}}
[[Arthur E. Bryson]] and [[Yu-Chi Ho]] (1969);{{Sfnp|Russell|Norvig|2021|p=785}}
Backpropagation was independently developed by:
[[Seppo Linnainmaa]] (1970);{{Sfnp|Schmidhuber|2022|loc=sect. 8}}
[[Paul Werbos]] (1974).{{Sfnp|Russell|Norvig|2021|p=785}}
}} but because of two factors: the incredible increase in computer power (including the hundred-fold increase in speed by switching to [[GPU]]s) and the availability of vast amounts of training data, especially the giant [[List of datasets for machine-learning research|curated datasets]] used for benchmark testing, such as [[ImageNet]].{{Efn|[[Geoffrey Hinton]] said, of his work on neural networks in the 1990s, "our labeled datasets were thousands of times too small. [And] our computers were millions of times too slow."<ref>Quoted in {{Harvtxt|Christian|2020|p=22}}</ref>}}

===GPT===
[[Generative pre-trained transformer]]s (GPT) are [[large language model]]s (LLMs) that generate text based on the semantic relationships between words in sentences. Text-based GPT models are pre-trained on a large [[corpus of text]] that can be from the Internet. The pretraining consists of predicting the next [[Lexical analysis|token]] (a token being usually a word, subword, or punctuation). Throughout this pretraining, GPT models accumulate knowledge about the world and can then generate human-like text by repeatedly predicting the next token. Typically, a subsequent training phase makes the model more truthful, useful, and harmless, usually with a technique called [[reinforcement learning from human feedback]] (RLHF). Current GPT models are prone to generating falsehoods called "[[Hallucination (artificial intelligence)|hallucinations]]". These can be reduced with RLHF and quality data, but the problem has been getting worse for reasoning systems.<ref>{{Cite news |last1=Metz |first1=Cade |last2=Weise |first2=Karen |date=2025-05-05 |title=A.I. Hallucinations Are Getting Worse, Even as New Systems Become More Powerful |url=https://www.nytimes.com/2025/05/05/technology/ai-hallucinations-chatgpt-google.html |access-date=2025-05-06 |work=The New York Times |language=en-US |issn=0362-4331}}</ref> Such systems are used in [[chatbot]]s, which allow people to ask a question or request a task in simple text.{{Sfnp|Smith|2023}}<ref>{{Cite web |date=9 November 2023 |title=Explained: Generative AI |url=https://news.mit.edu/2023/explained-generative-ai-1109}}</ref>

Current models and services include [[Gemini (chatbot)|Gemini]] (formerly Bard), [[ChatGPT]], [[Grok (chatbot)|Grok]], [[Anthropic#Claude|Claude]], [[Microsoft Copilot|Copilot]], and [[LLaMA]].<ref>{{Cite web |title=AI Writing and Content Creation Tools |url=https://mitsloanedtech.mit.edu/ai/tools/writing |access-date=25 December 2023 |publisher=MIT Sloan Teaching & Learning Technologies |archive-date=25 December 2023 |archive-url=https://web.archive.org/web/20231225232503/https://mitsloanedtech.mit.edu/ai/tools/writing/ |url-status=live }}</ref> [[Multimodal learning|Multimodal]] GPT models can process different types of data ([[Modality (human–computer interaction)|modalities]]) such as images, videos, sound, and text.{{Sfnp|Marmouyet|2023}}

===Hardware and software===
{{Main|Programming languages for artificial intelligence|Hardware for artificial intelligence}}

In the late 2010s, [[graphics processing unit]]s (GPUs) that were increasingly designed with AI-specific enhancements and used with specialized [[TensorFlow]] software had replaced previously used [[central processing unit]] (CPUs) as the dominant means for large-scale (commercial and academic) [[machine learning]] models' training.{{Sfnp|Kobielus|2019}} Specialized [[programming language]]s such as [[Prolog]] were used in early AI research,<ref>{{Cite web |last=Thomason |first=James |date=2024-05-21 |title=Mojo Rising: The resurgence of AI-first programming languages |url=https://venturebeat.com/ai/mojo-rising-the-resurgence-of-ai-first-programming-languages |access-date=2024-05-26 |website=VentureBeat |archive-date=27 June 2024 |archive-url=https://web.archive.org/web/20240627143853/https://venturebeat.com/ai/mojo-rising-the-resurgence-of-ai-first-programming-languages/ |url-status=live }}</ref> but [[general-purpose programming language]]s like [[Python (programming language)|Python]] have become predominant.<ref>{{Cite news |last=Wodecki |first=Ben |date=May 5, 2023 |title=7 AI Programming Languages You Need to Know |url=https://aibusiness.com/verticals/7-ai-programming-languages-you-need-to-know |work=AI Business |access-date=5 October 2024 |archive-date=25 July 2024 |archive-url=https://web.archive.org/web/20240725164443/https://aibusiness.com/verticals/7-ai-programming-languages-you-need-to-know |url-status=live }}</ref>

The transistor density in [[integrated circuit]]s has been observed to roughly double every 18 months—a trend known as [[Moore's law]], named after the [[Intel]] co-founder [[Gordon Moore]], who first identified it. Improvements in [[GPUs]] have been even faster,<ref>{{Cite web |last=Plumb |first=Taryn |date=2024-09-18 |title=Why Jensen Huang and Marc Benioff see 'gigantic' opportunity for agentic AI |url=https://venturebeat.com/ai/why-jensen-huang-and-marc-benioff-see-gigantic-opportunity-for-agentic-ai/ |access-date=2024-10-04 |website=VentureBeat |language=en-US |archive-date=5 October 2024 |archive-url=https://web.archive.org/web/20241005165649/https://venturebeat.com/ai/why-jensen-huang-and-marc-benioff-see-gigantic-opportunity-for-agentic-ai/ |url-status=live }}</ref> a trend sometimes called [[Huang's law]],<ref>{{Cite news |last=Mims |first=Christopher |date=2020-09-19 |title=Huang's Law Is the New Moore's Law, and Explains Why Nvidia Wants Arm |url=https://www.wsj.com/articles/huangs-law-is-the-new-moores-law-and-explains-why-nvidia-wants-arm-11600488001 |access-date=2025-01-19 |work=Wall Street Journal |language=en-US |issn=0099-9660 |archive-date=2 October 2023 |archive-url=https://web.archive.org/web/20231002080608/https://www.wsj.com/articles/huangs-law-is-the-new-moores-law-and-explains-why-nvidia-wants-arm-11600488001 |url-status=live }}</ref> named after [[Nvidia]] co-founder and CEO [[Jensen Huang]].