Editing Symbolic artificial intelligence (section)

== History ==

A short history of symbolic AI to the present day follows below. Time periods and titles are drawn from Henry Kautz's 2020 AAAI Robert S. Engelmore Memorial Lecture{{sfn|Kautz|2020}} and the longer Wikipedia article on the [[History of AI]], with dates and titles differing slightly for increased clarity.

=== The first AI summer: irrational exuberance, 1948–1966 ===

Success at early attempts in AI occurred in three main areas: artificial neural networks, knowledge representation, and heuristic search, contributing to high expectations. This section summarizes Kautz's reprise of early AI history.

====Approaches inspired by human or animal cognition or behavior====

Cybernetic approaches attempted to replicate the feedback loops between animals and their environments. A robotic turtle, with sensors, motors for driving and steering, and seven vacuum tubes for control, based on a preprogrammed neural net, was built as early as 1948. This work can be seen as an early precursor to later work in neural networks, reinforcement learning, and situated robotics.{{sfn|Kautz|2022|p=106}}

An important early symbolic AI program was the [[Logic theorist]], written by [[Allen Newell]], [[Herbert A. Simon|Herbert Simon]] and [[Cliff Shaw]] in 1955–56, as it was able to prove 38 elementary theorems from Whitehead and Russell's [[Principia Mathematica]]. Newell, Simon, and Shaw later generalized this work to create a domain-independent problem solver, [[General Problem Solver|GPS]] (General Problem Solver). GPS solved problems represented with formal operators via state-space search using [[means-ends analysis]].{{sfn|Newell|Simon|1972}}

During the 1960s, symbolic approaches achieved great success at simulating intelligent behavior in structured environments such as game-playing, symbolic mathematics, and theorem-proving. AI research was concentrated in four institutions in the 1960s: [[Carnegie Mellon University]], [[Stanford]], [[MIT]] and (later) [[University of Edinburgh]]. Each one developed its own style of research. Earlier approaches based on [[cybernetics]] or [[artificial neural network]]s were abandoned or pushed into the background.

[[Herbert A. Simon|Herbert Simon]] and [[Allen Newell]] studied human problem-solving skills and attempted to formalize them, and their work laid the foundations of the field of artificial intelligence, as well as [[cognitive science]], [[operations research]] and [[management science]]. Their research team used the results of [[psychology|psychological]] experiments to develop programs that simulated the techniques that people used to solve problems.{{sfn||McCorduck|2004|pp=139–179, 245–250, 322–323 (EPAM)}}{{sfn|Crevier|1993|pp=145–149}} This tradition, centered at Carnegie Mellon University would eventually culminate in the development of the [[Soar (cognitive architecture)|Soar]] architecture in the middle 1980s.{{sfn|McCorduck|2004|pp=450–451}}{{sfn|Crevier|1993|pp=258–263}}

====Heuristic search====

In addition to the highly specialized domain-specific kinds of knowledge that we will see later used in expert systems, early symbolic AI researchers discovered another more general application of knowledge. These were called heuristics, rules of thumb that guide a search in promising directions: "How can non-enumerative search be practical when the underlying problem is exponentially hard? The approach advocated by Simon and Newell is to employ [[Heuristic (computer science)|heuristics]]: fast algorithms that may fail on some inputs or output suboptimal solutions."{{sfn|Kautz|2022|page=108}} Another important advance was to find a way to apply these heuristics that guarantees a solution will be found, if there is one, not withstanding the occasional fallibility of heuristics: "The [[A* search algorithm|A* algorithm]] provided a general frame for complete and optimal heuristically guided search. A* is used as a subroutine within practically every AI algorithm today but is still no magic bullet; its guarantee of completeness is bought at the cost of worst-case exponential time.{{sfn|Kautz|2022|page=108}}

====Early work on knowledge representation and reasoning====

Early work covered both applications of formal reasoning emphasizing [[first-order logic]], along with attempts to handle [[Commonsense reasoning|common-sense reasoning]] in a less formal manner.

===== Modeling formal reasoning with logic: the "neats" =====
{{Main|logic programming}}
Unlike Simon and Newell, [[John McCarthy (computer scientist)|John McCarthy]] felt that machines did not need to simulate the exact mechanisms of human thought, but could instead try to find the essence of abstract reasoning and problem-solving with logic,{{sfn|Russell|Norvig|2021|loc=p. 9 (logicist AI), p. 19 (McCarthy's work)}} regardless of whether people used the same algorithms.{{efn|
McCarthy once said: "This is AI, so we don't care if it's psychologically real".{{sfn|Kolata|1982}} McCarthy reiterated his position in 2006 at the [[AI@50]] conference where he said "Artificial intelligence is not, by definition, simulation of human intelligence".{{sfn|Maker|2006}} [[Pamela McCorduck]] writes that there are "two major branches of artificial intelligence: one aimed at producing intelligent behavior regardless of how it was accomplished, and the other aimed at modeling intelligent processes found in nature, particularly human ones.",{{sfn|McCorduck|2004|pp=100–101}} 
[[Stuart J. Russell|Stuart Russell]] and [[Peter Norvig]] wrote "Aeronautical engineering texts do not define the goal of their field as making 'machines that fly so exactly like pigeons that they can fool even other pigeons.'"{{sfn|Russell|Norvig|2021|p=2}}}}
His laboratory at [[Stanford University|Stanford]] ([[Stanford Artificial Intelligence Laboratory|SAIL]]) focused on using formal [[logic]] to solve a wide variety of problems, including [[knowledge representation]], planning and [[machine learning|learning]].{{sfn|McCorduck|2004|pp=251–259}}
Logic was also the focus of the work at the [[University of Edinburgh]] and elsewhere in Europe which led to the development of the programming language [[Prolog]] and the science of logic programming.{{sfn|Crevier|1993|pp=193–196}}{{sfn|Howe|1994}}

===== Modeling implicit common-sense knowledge with frames and scripts: the "scruffies" =====
{{Main|neats vs. scruffies}}

Researchers at [[MIT]] (such as [[Marvin Minsky]] and [[Seymour Papert]]){{sfn|McCorduck|2004|pp=259–305}}{{sfn|Crevier|1993|pp=83–102, 163–176}}{{sfn|Russell|Norvig|2021|p=19}} found that solving difficult problems in [[computer vision|vision]] and [[natural language processing]] required ad hoc solutions—they argued that no simple and general principle (like [[logic]]) would capture all the aspects of intelligent behavior. [[Roger Schank]] described their "anti-logic" approaches as "[[Neats vs. scruffies|scruffy]]" (as opposed to the "[[neats vs. scruffies|neat]]" paradigms at [[Carnegie Mellon University|CMU]] and Stanford).{{sfn|McCorduck|2004|pp=421–424, 486–489}}{{sfn|Crevier|1993|p=168}}
[[Commonsense knowledge bases]] (such as [[Doug Lenat]]'s [[Cyc]]) are an example of "scruffy" AI, since they must be built by hand, one complicated concept at a time.{{sfn|McCorduck|2004|p=489}}{{sfn|Crevier|1993|pp=239–243}}{{sfn|Russell|Norvig|2021|p=316, 340}}

=== The first AI winter: crushed dreams, 1967–1977 ===

The first AI winter was a shock:

{{Blockquote
|text=During the first AI summer, many people thought that machine intelligence could be achieved in just a few years. The Defense Advance Research Projects Agency (DARPA) launched programs to support AI research to use AI to solve problems of national security; in particular, to automate the translation of Russian to English for intelligence operations and to create autonomous tanks for the battlefield. Researchers had begun to realize that achieving AI was going to be much harder than was supposed a decade earlier, but a combination of hubris and disingenuousness led many university and think-tank researchers to accept funding with promises of deliverables that they should have known they could not fulfill. By the mid-1960s neither useful natural language translation systems nor autonomous tanks had been created, and a dramatic backlash set in. New DARPA leadership canceled existing AI funding programs.

...

Outside of the United States, the most fertile ground for AI research was the United Kingdom. The AI winter in the United Kingdom was spurred on not so much by disappointed military leaders as by rival academics who viewed AI researchers as charlatans and a drain on research funding. A professor of applied mathematics, [[Lighthill report|Sir James Lighthill, was commissioned by Parliament to evaluate the state of AI research in the nation]]. The report stated that all of the problems being worked on in AI would be better handled by researchers from other disciplines—such as applied mathematics. The report also claimed that AI successes on toy problems could never scale to real-world applications due to combinatorial explosion.{{sfn|Kautz|2022|p=109}}
}}

=== The second AI summer: knowledge is power, 1978–1987 ===

==== Knowledge-based systems ====

As limitations with weak, domain-independent methods became more and more apparent,{{sfn|Russell|Norvig|2021|page=22}} researchers from all three traditions began to build [[knowledge representation|knowledge]] into AI applications.{{sfn|McCorduck|2004|pp=266–276, 298–300, 314, 421}}{{sfn|Russell|Norvig|2021|pp=22–23}} The knowledge revolution was driven by the realization that knowledge underlies high-performance, domain-specific AI applications.

[[Edward Feigenbaum]] said:
* "In the knowledge lies the power."<ref name="Feigenbaum">{{Cite journal| doi = 10.1145/1743546.1743564| issn = 0001-0782| volume = 53| issue = 6| pages = 41–45| last = Shustek| first = Len| title = An interview with Ed Feigenbaum| journal = Communications of the ACM| accessdate = 2022-07-14| date = June 2010| s2cid = 10239007| url = https://dl.acm.org/doi/10.1145/1743546.1743564| url-access = subscription}}</ref> 
to describe that high performance in a specific domain requires both general and highly domain-specific knowledge. Ed Feigenbaum and Doug Lenat called this The Knowledge Principle: 
{{Blockquote
|text=(1) The Knowledge Principle: if a program is to perform a complex task well, it must know a great deal about the world in which it operates.<br/>(2) A plausible extension of that principle, called the Breadth Hypothesis: there are two additional abilities necessary for intelligent behavior in unexpected situations: falling back on increasingly general knowledge, and analogizing to specific but far-flung knowledge.<ref name="Knowledge Principle">{{Cite journal| last1=Lenat| first1=Douglas B| last2=Feigenbaum| first2=Edward A| title=On the thresholds of knowledge| journal=Proceedings of the International Workshop on Artificial Intelligence for Industrial Applications| date=1988| pages=291–300| doi=10.1109/AIIA.1988.13308| s2cid=11778085}}</ref>}}

==== Success with expert systems ====
{{Main|Expert systems}}

This "knowledge revolution" led to the development and deployment of [[expert system]]s (introduced by [[Edward Feigenbaum]]), the first commercially successful form of AI software.{{sfn|Russell|Norvig|2021|pp=22–24}}{{sfn|McCorduck|2004|pp=327–335, 434–435}}{{sfn|Crevier|1993|pp=145–62, 197–203}}

Key expert systems were:

* [[DENDRAL]], which found the structure of organic molecules from their chemical formula and mass spectrometer readings.
* [[MYCIN]], which diagnosed bacteremia – and suggested further lab tests, when necessary – by interpreting lab results, patient history, and doctor observations. "With about 450 rules, MYCIN was able to perform as well as some experts, and considerably better than junior doctors."{{sfn|Russell|Norvig|2021|p=23}}
* [[Internist-I|INTERNIST]] and [[CADUCEUS (expert system)|CADUCEUS]] which tackled internal medicine diagnosis. Internist attempted to capture the expertise of the chairman of internal medicine at the [[University of Pittsburgh School of Medicine]] while CADUCEUS could eventually diagnose up to 1000 different diseases.
* GUIDON, which showed how a knowledge base built for expert problem solving could be repurposed for teaching.{{sfn|Clancey|1987}}
* [[XCON]], to configure VAX computers, a then laborious process that could take up to 90 days. XCON reduced the time to about 90 minutes.{{sfn|Kautz|2022|p=110}}

[[DENDRAL]] is considered the first expert system that relied on knowledge-intensive problem-solving. It is described below, by [[Ed Feigenbaum]], from a [[Communications of the ACM]] interview, [https://cacm.acm.org/magazines/2010/6/92472-an-interview-with-ed-feigenbaum/fulltext|An Interview with Ed Feigenbaum]:

{{Blockquote
|text=One of the people at Stanford interested in computer-based models of mind was [[Joshua Lederberg]], the 1958 Nobel Prize winner in genetics. When I told him I wanted an induction "sandbox", he said, "I have just the one for you." His lab was doing mass spectrometry of amino acids. The question was: how do you go from looking at the spectrum of an amino acid to the chemical structure of the amino acid? That's how we started the DENDRAL Project: I was good at heuristic search methods, and he had an algorithm that was good at generating the chemical problem space.

We did not have a grandiose vision. We worked bottom up. Our chemist was [[Carl Djerassi]], inventor of the chemical behind the birth control pill, and also one of the world's most respected mass spectrometrists. Carl and his postdocs were world-class experts in mass spectrometry. We began to add to their knowledge, inventing knowledge of engineering as we went along. These experiments amounted to titrating DENDRAL more and more knowledge. The more you did that, the smarter the program became. We had very good results.

The generalization was: in the knowledge lies the power. That was the big idea. In my career that is the huge, "Ah ha!," and it wasn't the way AI was being done previously. Sounds simple, but it's probably AI's most powerful generalization.<ref name="Feignebaum Interview">{{Cite journal| doi = 10.1145/1743546.1743564| issn = 0001-0782| volume = 53| issue = 6| pages = 41–45| last = Shustek| first = Len| title = An interview with Ed Feigenbaum| journal = Communications of the ACM| accessdate = 2022-08-05| date = 2010| s2cid = 10239007| url = https://cacm.acm.org/magazines/2010/6/92472-an-interview-with-ed-feigenbaum/fulltext| url-access = subscription}}</ref>}}

The other expert systems mentioned above came after DENDRAL. MYCIN exemplifies the classic expert system architecture of a knowledge-base of rules coupled to a symbolic reasoning mechanism, including the use of certainty factors to handle uncertainty. GUIDON shows how an explicit knowledge base can be repurposed for a second application, tutoring, and is an example of an [[intelligent tutoring system]], a particular kind of knowledge-based application. Clancey showed that it was not sufficient simply to use [[MYCIN]]'s rules for instruction, but that he also needed to add rules for dialogue management and student modeling.{{sfn|Clancey|1987}} XCON is significant because of the millions of dollars it saved [[Digital Equipment Corporation|DEC]], which triggered the expert system boom where most all major corporations in the US had expert systems groups, to capture corporate expertise, preserve it, and automate it:

{{Blockquote
|text=By 1988, DEC's AI group had 40 expert systems deployed, with more on the way. DuPont had 100 in use and 500 in development. Nearly every major U.S. corporation had its own Al group and was either using or investigating expert systems.{{sfn|Russell|Norvig|2021|p=23}}
}}

Chess expert knowledge was encoded in [[Deep Blue (chess computer)|Deep Blue]]. In 1996, this allowed [[IBM]]'s [[Deep Blue (chess computer)|Deep Blue]], with the help of symbolic AI, to win in a game of chess against the world champion at that time, [[Garry Kasparov]].<ref>{{Cite web|title=The fascination with AI: what is artificial intelligence?|url=https://www.ionos.com/digitalguide/online-marketing/online-sales/what-is-artificial-intelligence/|access-date=2021-12-02|website=IONOS Digitalguide|language=en}}</ref>

===== Architecture of knowledge-based and expert systems =====

A key component of the system architecture for all expert systems is the knowledge base, which stores facts and rules for problem-solving.{{sfn|Hayes-Roth|Murray|Adelman|2015}}
The simplest approach for an expert system knowledge base is simply a collection or network of [[Production system (computer science)|production rules]]. Production rules connect symbols in a relationship similar to an If-Then statement. The expert system processes the rules to make deductions and to determine what additional information it needs, i.e. what questions to ask, using human-readable symbols. For example, [[OPS5]], [[CLIPS]] and their successors [[Jess (programming language)|Jess]] and [[Drools]] operate in this fashion.

Expert systems can operate in either a [[forward chaining]] – from evidence to conclusions – or [[backward chaining]] – from goals to needed data and prerequisites – manner. More advanced knowledge-based systems, such as [[Soar (cognitive architecture)|Soar]] can also perform meta-level reasoning, that is reasoning about their own reasoning in terms of deciding how to solve problems and monitoring the success of problem-solving strategies.

[[Blackboard system]]s are a second kind of [[knowledge-based system|knowledge-based]] or [[expert system]] architecture. They model a community of experts incrementally contributing, where they can, to solve a problem. The problem is represented in multiple levels of abstraction or alternate views. The experts (knowledge sources) volunteer their services whenever they recognize they can contribute. Potential problem-solving actions are represented on an agenda that is updated as the problem situation changes. A controller decides how useful each contribution is, and who should make the next problem-solving action. One example, the BB1 blackboard architecture<ref name="BB1">{{Cite journal| doi = 10.1016/0004-3702(85)90063-3| volume = 26| issue = 3| pages = 251–321| last = Hayes-Roth| first = Barbara| title = A blackboard architecture for control| journal = Artificial Intelligence| date = 1985}}</ref> was originally inspired by studies of how humans plan to perform multiple tasks in a trip.<ref name="OPM">{{Cite conference| publisher = RAND| last = Hayes-Roth| first = Barbara| title = Human Planning Processes| date = 1980}}</ref> An innovation of BB1 was to apply the same blackboard model to solving its control problem, i.e., its controller performed meta-level reasoning with knowledge sources that monitored how well a plan or the problem-solving was proceeding and could switch from one strategy to another as conditions – such as goals or times – changed. BB1 has been applied in multiple domains: construction site planning, intelligent tutoring systems, and real-time patient monitoring.

=== The second AI winter, 1988–1993 ===

At the height of the AI boom, companies such as [[Symbolics]], [[Lisp Machines|LMI]], and [[Texas Instruments]] were selling [[LISP machine]]s specifically targeted to accelerate the development of AI applications and research. In addition, several artificial intelligence companies, such as Teknowledge and [[Inference Corporation]], were selling expert system shells, training, and consulting to corporations.

Unfortunately, the AI boom did not last and Kautz best describes the second AI winter that followed:
{{Blockquote
|text=Many reasons can be offered for the arrival of the second AI winter. The hardware companies failed when much more cost-effective general Unix workstations from Sun together with good compilers for LISP and Prolog came onto the market. Many commercial deployments of expert systems were discontinued when they proved too costly to maintain. Medical expert systems never caught on for several reasons: the difficulty in keeping them up to date; the challenge for medical professionals to learn how to use a bewildering variety of different expert systems for different medical conditions; and perhaps most crucially, the reluctance of doctors to trust a computer-made diagnosis over their gut instinct, even for specific domains where the expert systems could outperform an average doctor. Venture capital money deserted AI practically overnight. The world AI conference IJCAI hosted an enormous and lavish trade show and thousands of nonacademic attendees in 1987 in Vancouver; the main AI conference the following year, AAAI 1988 in St. Paul, was a small and strictly academic affair.
{{sfn|Kautz|2022|page=110}}
}}

=== Adding in more rigorous foundations, 1993–2011 ===

==== Uncertain reasoning ====

Both statistical approaches and extensions to logic were tried.

One statistical approach, [[hidden Markov model]]s, had already been popularized in the 1980s for speech recognition work.{{sfn|Russell|Norvig|2021|p=25}} Subsequently, in 1988, [[Judea Pearl]] popularized the use of [[Bayesian Networks]] as a sound but efficient way of handling uncertain reasoning with his publication of the book Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference.{{sfn|Pearl|1988}} and Bayesian approaches were applied successfully in expert systems.{{sfn|Spiegelhalter |Dawid|Lauritzen|Cowell|1993}} Even later, in the 1990s, statistical relational learning, an approach that combines probability with logical formulas, allowed probability to be combined with first-order logic, e.g., with either [[Markov logic network|Markov Logic Networks]] or [[Probabilistic Soft Logic]].

Other, non-probabilistic extensions to first-order logic to support were also tried. For example, [[non-monotonic reasoning]] could be used with [[Reason maintenance|truth maintenance systems]]. A [[truth maintenance system]] tracked assumptions and justifications for all inferences. It allowed inferences to be withdrawn when assumptions were found out to be incorrect or a contradiction was derived. Explanations could be provided for an inference by [[Explainable artificial intelligence|explaining which rules were applied]] to create it and then continuing through underlying inferences and rules all the way back to root assumptions.{{sfn|Russell|Norvig|2021|pp=335-337}} [[Lotfi Zadeh]] had introduced a different kind of extension to handle the representation of vagueness. For example, in deciding how "heavy" or "tall" a man is, there is frequently no clear "yes" or "no" answer, and a predicate for heavy or tall would instead return values between 0 and 1. Those values represented to what degree the predicates were true. His [[fuzzy logic]] further provided a means for propagating combinations of these values through logical formulas.{{sfn|Russell|Norvig|2021|p=459}}

==== Machine learning ====

Symbolic machine learning approaches were investigated to address the [[knowledge acquisition]] bottleneck. One of the earliest is [[Dendral#Meta-Dendral|Meta-DENDRAL]]. Meta-DENDRAL used a generate-and-test technique to generate plausible rule hypotheses to test against spectra. Domain and task knowledge reduced the number of candidates tested to a manageable size. [[Ed Feigenbaum|Feigenbaum]] described Meta-DENDRAL as

{{Blockquote
|text=...the culmination of my dream of the early to mid-1960s having to do with theory formation. The conception was that you had a problem solver like DENDRAL that took some inputs and produced an output. In doing so, it used layers of knowledge to steer and prune the search. That knowledge got in there because we interviewed people. But how did the people get the knowledge? By looking at thousands of spectra. So we wanted a program that would look at thousands of spectra and infer the knowledge of mass spectrometry that DENDRAL could use to solve individual hypothesis formation problems.

We did it. We were even able to publish new knowledge of mass spectrometry in the ''[[Journal of the American Chemical Society]]'', giving credit only in a footnote that a program, Meta-DENDRAL, actually did it. We were able to do something that had been a dream: to have a computer program come up with a new and publishable piece of science.<ref name="Feignebaum Interview"/>}}

In contrast to the knowledge-intensive approach of Meta-DENDRAL, [[Ross Quinlan]] invented a domain-independent approach to statistical classification, [[decision tree learning]], starting first with [[ID3 algorithm|ID3]]<ref>{{harvc|in1=Michalski|in2=Carbonell|in3=Mitchell|year=1983|c=Chapter 15: Learning Efficient Classification Procedures and their Application to Chess End Games |first=J. Ross |last=Quinlan}}</ref> and then later extending its capabilities to [[C4.5]].<ref>{{Cite book| edition = 1st | publisher = Morgan Kaufmann| isbn = 978-1-55860-238-0| last = Quinlan| first = J. Ross| title = C4.5: Programs for Machine Learning| location = San Mateo, Calif| date = 1992-10-15}}</ref> The decision trees created are [[glass box]], interpretable classifiers, with human-interpretable classification rules.

Advances were made in understanding machine learning theory, too. [[Tom M. Mitchell|Tom Mitchell]] introduced [[version space learning]] which describes learning as a search through a space of hypotheses, with upper, more general, and lower, more specific, boundaries encompassing all viable hypotheses consistent with the examples seen so far.<ref>{{harvc|in1=Michalski|in2=Carbonell|in3=Mitchell|year=1983 |c=Chapter 6: Learning by Experimentation: Acquiring and Refining Problem-Solving Heuristics |first1=Tom M. |last1=Mitchell |first2=Paul E. |last2=Utgoff |first3=Ranan |last3=Banerji}}</ref> More formally, [[Leslie Valiant|Valiant]] introduced [[Probably approximately correct learning|Probably Approximately Correct Learning]] (PAC Learning), a framework for the mathematical analysis of machine learning.<ref>{{Cite journal| doi = 10.1145/1968.1972| issn = 0001-0782| volume = 27| issue = 11| pages = 1134–1142| last = Valiant| first = L. G.| title = A theory of the learnable| journal = Communications of the ACM| date = 1984-11-05| s2cid = 12837541| doi-access = free}}</ref>

Symbolic machine learning encompassed more than learning by example. E.g., [[John Robert Anderson (psychologist)|John Anderson]] provided a [[cognitive model]] of human learning where skill practice results in a compilation of rules from a declarative format to a procedural format with his [[ACT-R]] [[cognitive architecture]]. For example, a student might learn to apply "Supplementary angles are two angles whose measures sum 180 degrees" as several different procedural rules. E.g., one rule might say that if X and Y are supplementary and you know X, then Y will be 180 - X. He called his approach "knowledge compilation". [[ACT-R]] has been used successfully to model aspects of human cognition, such as learning and retention. ACT-R is also used in [[intelligent tutoring systems]], called [[cognitive tutors]], to successfully teach geometry, computer programming, and algebra to school children.<ref "pump"="">{{Cite journal| volume = 8| pages = 30–43| last1 = Koedinger| first1 = K. R.| last2 = Anderson| first2 = J. R.| last3 = Hadley| first3 = W. H.| last4 = Mark| first4 = M. A.| last5 = others| title = Intelligent tutoring goes to school in the big city| journal = International Journal of Artificial Intelligence in Education | accessdate = 2012-08-18| date = 1997| url = http://telearn.archives-ouvertes.fr/hal-00197383/}}</ref>

Inductive logic programming was another approach to learning that allowed logic programs to be synthesized from input-output examples. E.g., [[Ehud Shapiro]]'s MIS (Model Inference System) could synthesize Prolog programs from examples.<ref>{{Cite conference| conference = IJCAI| volume = 2| pages = 1064| last = Shapiro| first = Ehud Y| title = The Model Inference System| book-title = Proceedings of the 7th international joint conference on Artificial intelligence| date = 1981}}</ref> [[John R. Koza]] applied [[genetic algorithms]] to [[program synthesis]] to create [[genetic programming]], which he used to synthesize LISP programs. Finally, [[Zohar Manna]] and [[Richard Waldinger]] provided a more general approach to [[program synthesis]] that synthesizes a [[functional programming|functional program]] in the course of proving its specifications to be correct.<ref>{{Cite journal| doi = 10.1145/357084.357090| volume = 2| pages = 90–121| last1 = Manna| first1 = Zohar| last2 = Waldinger| first2 = Richard| title = A Deductive Approach to Program Synthesis| journal = ACM Trans. Program. Lang. Syst.| date = 1980-01-01| issue = 1| s2cid = 14770735}}</ref>

As an alternative to logic, [[Roger Schank]] introduced case-based reasoning (CBR). The CBR approach outlined in his book, Dynamic Memory,<ref name="Schank">{{Cite book| publisher = Cambridge University Press| isbn = 978-0-521-27029-8| last = Schank| first = Roger C.| title = Dynamic Memory: A Theory of Reminding and Learning in Computers and People| location = Cambridge Cambridgeshire : New York| date = 1983-01-28}}</ref> focuses first on remembering key problem-solving cases for future use and generalizing them where appropriate. When faced with a new problem, CBR retrieves the most similar previous case and adapts it to the specifics of the current problem.<ref>{{Cite book| publisher = Academic Press| isbn = 978-0-12-322060-8| last = Hammond| first = Kristian J.| title = Case-Based Planning: Viewing Planning as a Memory Task| location = Boston| date = 1989-04-11}}</ref> Another alternative to logic, [[genetic algorithms]] and [[genetic programming]] are based on an evolutionary model of learning, where sets of rules are encoded into populations, the rules govern the behavior of individuals, and selection of the fittest prunes out sets of unsuitable rules over many generations.<ref>{{Cite book| edition = 1st | publisher = A Bradford Book| isbn = 978-0-262-11170-6| last = Koza| first = John R.| title = Genetic Programming: On the Programming of Computers by Means of Natural Selection| location = Cambridge, Mass| date = 1992-12-11}}</ref>

Symbolic machine learning was applied to learning concepts, rules, heuristics, and problem-solving. Approaches, other than those above, include:
# Learning from instruction or advice—i.e., taking human instruction, posed as advice, and determining how to operationalize it in specific situations. For example, in a game of Hearts, learning ''exactly how'' to play a hand to "avoid taking points."<ref>{{harvc|in1=Michalski|in2=Carbonell|in3=Mitchell|year=1983|c=Chapter 12: Machine Transformation of Advice into a Heuristic Search Procedure |first=David Jack |last=Mostow}}</ref>
# Learning from exemplars—improving performance by accepting subject-matter expert (SME) feedback during training. When problem-solving fails, querying the expert to either learn a new exemplar for problem-solving or to learn a new explanation as to exactly why one exemplar is more relevant than another. For example, the program Protos learned to diagnose tinnitus cases by interacting with an audiologist.<ref>{{harvc |in1=Michalski |in2=Carbonell |in3=Mitchell |year=1986 |pp=112-139|c=Chapter 4: Protos: An Exemplar-Based Learning Apprentice |first=Ray |last=Bareiss|first2=Bruce|last2=Porter|first3=Craig|last3=Wier}}</ref>
# Learning by analogy—constructing problem solutions based on similar problems seen in the past, and then modifying their solutions to fit a new situation or domain.<ref>{{harvc |in1=Michalski |in2=Carbonell |in3=Mitchell |year=1983 |pp=137-162|c=Chapter 5: Learning by Analogy: Formulating and Generalizing Plans from Past Experience |first=Jaime |last=Carbonell}}</ref><ref>{{harvc |in1=Michalski |in2=Carbonell |in3=Mitchell |year=1986 |pp=371-392|c=Chapter 14: Derivational Analogy: A Theory of Reconstructive Problem Solving and Expertise Acquisition |first=Jaime |last=Carbonell}}</ref>
# Apprentice learning systems—learning novel solutions to problems by observing human problem-solving. Domain knowledge explains why novel solutions are correct and how the solution can be generalized. LEAP learned how to design VLSI circuits by observing human designers.<ref>{{harvc|in1=Kodratoff|in2=Michalski|year=1990|pp=271-289|c=Chapter 10: LEAP: A Learning Apprentice for VLSI Design |first=Tom |last=Mitchell|first2=Sridbar |last2=Mabadevan|first3=Louis|last3=Steinberg}}</ref>
# Learning by discovery—i.e., creating tasks to carry out experiments and then learning from the results. [[Douglas Lenat|Doug Lenat]]'s [[Eurisko]], for example, learned heuristics to beat human players at the [[Traveller (role-playing game)|Traveller]] role-playing game for two years in a row.<ref>{{harvc|in1=Michalski|in2=Carbonell|in3=Mitchell|year=1983|pp=243-306|c=Chapter 9: The Role of Heuristics in Learning by Discovery: Three Case Studies|first=Douglas |last=Lenat}}</ref>
# Learning macro-operators—i.e., searching for useful macro-operators to be learned from sequences of basic problem-solving actions. Good macro-operators simplify problem-solving by allowing problems to be solved at a more abstract level.<ref>{{Cite book| publisher = Pitman Publishing| isbn = 0-273-08690-1| last = Korf| first = Richard E.| title = Learning to Solve Problems by Searching for Macro-Operators| series = Research Notes in Artificial Intelligence| date = 1985}}</ref>

=== Deep learning and neuro-symbolic AI 2011–now ===

With the rise of deep learning, the symbolic AI approach has been compared to deep learning as complementary "...with parallels having been drawn many times by AI researchers between [[Daniel Kahneman|Kahneman's]] research on human reasoning and decision making – reflected in his book ''[[Thinking, Fast and Slow]]'' – and the so-called "AI systems 1 and 2", which would in principle be modelled by deep learning and symbolic reasoning, respectively." In this view, symbolic reasoning is more apt for deliberative reasoning, planning, and explanation while deep learning is more apt for fast pattern recognition in perceptual applications with noisy data.<ref name="Rossi"/><ref name="Selman"/>

==== Neuro-symbolic AI: integrating neural and symbolic approaches ====

Neuro-symbolic AI attempts to integrate neural and symbolic architectures in a manner that addresses strengths and weaknesses of each, in a complementary fashion, in order to support robust AI capable of reasoning, learning, and cognitive modeling. As argued by [[Leslie Valiant|Valiant]]{{sfn|Valiant|2008}} and many others,{{sfn|Garcez|Besold|De Raedt|Földiák|2015}} the effective construction of rich computational [[cognitive model]]s demands the combination of sound symbolic reasoning and efficient (machine) learning models. [[Gary Marcus]], similarly, argues that: "We cannot construct rich cognitive models in an adequate, automated way without the triumvirate of hybrid architecture, rich prior knowledge, and sophisticated techniques for reasoning.",{{sfn|Marcus|2020|p=44}} and in particular:
"To build a robust, knowledge-driven approach to AI we must have the machinery of symbol-manipulation in our toolkit. Too much of useful knowledge is abstract to make do without tools that represent and manipulate abstraction, and to date, the only machinery that we know of that can manipulate such abstract knowledge reliably is the apparatus of symbol manipulation."{{sfn|Marcus|2020|p=17}}

[[Henry Kautz]],{{sfn|Kautz|2020}} [[Francesca Rossi]],{{sfn|Rossi|2022}} and [[Bart Selman]]{{sfn|Selman|2022}} have also argued for a synthesis. Their arguments are based on a need to address the two kinds of thinking discussed in [[Daniel Kahneman]]'s book, ''[[Thinking, Fast and Slow]]''. Kahneman describes human thinking as having two components, [[Thinking, Fast and Slow#Two systems|System 1 and System 2]]. System 1 is fast, automatic, intuitive and unconscious. System 2 is slower, step-by-step, and explicit. System 1 is the kind used for pattern recognition while System 2 is far better suited for planning, deduction, and deliberative thinking. In this view, deep learning best models the first kind of thinking while symbolic reasoning best models the second kind and both are needed.

[[Artur Garcez|Garcez]] and Lamb describe research in this area as being ongoing for at least the past twenty years,{{sfn|Garcez|Lamb|2020|p=2}} dating from their 2002 book on neurosymbolic learning systems.{{sfn|Garcez|Broda|Gabbay|Gabbay|2002}} A series of workshops on neuro-symbolic reasoning has been held every year since 2005, see http://www.neural-symbolic.org/ for details.

In their 2015 paper, Neural-Symbolic Learning and Reasoning: Contributions and Challenges, Garcez et al. argue that:
{{Blockquote
|text=The integration of the symbolic and connectionist paradigms of AI has been pursued by a relatively small research community over the last two decades and has yielded several significant results. Over the last decade, neural symbolic systems have been shown capable of overcoming the so-called propositional fixation of neural networks, as McCarthy (1988) put it in response to Smolensky (1988); see also (Hinton, 1990). Neural networks were shown capable of representing modal and temporal logics (d'Avila Garcez and Lamb, 2006) and fragments of first-order logic (Bader, Hitzler, Hölldobler, 2008; d'Avila Garcez, Lamb, Gabbay, 2009). Further, neural-symbolic systems have been applied to a number of problems in the areas of bioinformatics, control engineering, software verification and adaptation, visual intelligence, ontology learning, and computer games.{{sfn|Garcez|Besold|De Raedt|Földiák|2015}}
}}

Approaches for integration are varied. [[Henry Kautz]]'s taxonomy of neuro-symbolic architectures, along with some examples, follows:
* Symbolic Neural symbolic—is the current approach of many neural models in natural language processing, where words or subword tokens are both the ultimate input and output of large language models. Examples include [[BERT (language model)|BERT]], RoBERTa, and [[GPT-3]]. 
* Symbolic[Neural]—is exemplified by [[AlphaGo]], where symbolic techniques are used to call neural techniques. In this case the symbolic approach is [[Monte Carlo tree search]] and the neural techniques learn how to evaluate game positions.
* Neural|Symbolic—uses a neural architecture to interpret perceptual data as symbols and relationships that are then reasoned about symbolically.
* Neural:Symbolic → Neural—relies on symbolic reasoning to generate or label training data that is subsequently learned by a deep learning model, e.g., to train a neural model for symbolic computation by using a [[Macsyma]]-like symbolic mathematics system to create or label examples.
* Neural_{Symbolic}—uses a neural net that is generated from symbolic rules. An example is the Neural Theorem Prover,<ref>{{Cite conference| publisher = Association for Computational Linguistics| doi = 10.18653/v1/W16-1309| pages = 45–50| last1 = Rocktäschel| first1 = Tim| last2 = Riedel| first2 = Sebastian| title = Learning Knowledge Base Inference with Neural Theorem Provers| book-title = Proceedings of the 5th Workshop on Automated Knowledge Base Construction| location = San Diego, CA| accessdate = 2022-08-06| date = 2016| url = https://aclanthology.org/W16-1309| doi-access = free}}</ref> which constructs a neural network from an [[And–or tree|AND–OR]] proof tree generated from knowledge base rules and terms. Logic Tensor Networks<ref>{{Citation| arxiv = 1606.04422| last1 = Serafini| first1 = Luciano| last2 = Garcez| first2 = Artur d'Avila| title = Logic Tensor Networks: Deep Learning and Logical Reasoning from Data and Knowledge| date = 2016}}</ref> also fall into this category.
* Neural[Symbolic]—allows a neural model to directly call a symbolic reasoning engine, e.g., to perform an action or evaluate a state.

Many key research questions remain, such as:
* What is the best way to integrate neural and symbolic architectures?<ref name=":0">{{Cite book |last1=Garcez |first1=Artur d'Avila |url=https://link.springer.com/book/10.1007/978-3-540-73246-4 |title=Neural-Symbolic Cognitive Reasoning |last2=Lamb |first2=Luis C. |last3=Gabbay |first3=Dov M. |publisher=Springer |year=2009 |isbn=978-3-540-73245-7 |edition=1st |location=Berlin-Heidelberg |doi=10.1007/978-3-540-73246-4 |bibcode=2009nscr.book.....D |s2cid=14002173 |language=English}}</ref>
* How should symbolic structures be represented within neural networks and extracted from them?
* How should common-sense knowledge be learned and reasoned about?
* How can abstract knowledge that is hard to encode logically be handled?