Editing LALR parser

{{Short description|Type of parser in computer science}}
{{Use dmy dates|date=October 2020}}
In [[computer science]], an '''LALR parser'''{{efn|"LALR" is pronounced as the [[initialism]] "el-ay-el-arr"}} ('''look-ahead, left-to-right, rightmost derivation parser''') is part of the compiling process where human readable text is converted into a structured representation to be read by computers. An LALR parser is a software tool to process ([[parse]]) text into a very specific internal representation that other programs, such as compilers, can work with. This process happens according to a set of [[production (computer science)|production rules]] specified by a [[formal grammar]] for a [[computer language]]. 

An LALR parser is a simplified version of a [[canonical LR parser]].

The LALR parser was invented by [[Frank DeRemer]] in his 1969 PhD dissertation, ''Practical Translators for LR(k) languages'',{{sfn|DeRemer|1969}} in his treatment of the practical difficulties at that time of implementing LR(1) parsers. He showed that the LALR parser has more language recognition power than the LR(0) parser, while requiring the same number of states as the LR(0) parser for a language that can be recognized by both parsers. This makes the LALR parser a memory-efficient alternative to the LR(1) parser for languages that are LALR. It was also proven that there exist LR(1) languages that are not LALR. Despite this weakness, the power of the LALR parser is sufficient for many mainstream computer languages,<ref name=chapman>''LR Parsing: Theory and Practice,'' Nigel P. Chapman, [https://books.google.com/books?id=nEA9AAAAIAAJ&pg=PA87 p. 86–87]</ref> including [[Java technology|Java]],<ref>{{cite web|title=Generate the Parser
|url=http://www.eclipse.org/jdt/core/howto/generate%20parser/generateParser.html|publisher=Eclipse JDT Project|access-date=29 June 2012}}</ref> though the reference grammars for many languages fail to be LALR due to being [[ambiguous grammar|ambiguous]].<ref name=chapman />

The original dissertation gave no algorithm for constructing such a parser given a formal grammar. The first algorithms for LALR parser generation were published in 1973.<ref>{{cite journal| first1=T.| last1=Anderson| first2=J.| last2=Eve| first3=J.| last3=Horning| title=Efficient LR(1) parsers| journal=Acta Informatica| issue=2| year=1973| pages= 2–39}}</ref> In 1982, DeRemer and Tom Pennello published an algorithm that generated highly memory-efficient LALR parsers.<ref name="DeRemer82">{{cite journal |first1=Frank |last1=DeRemer |first2=Thomas |last2=Pennello |date=October 1982 |title=Efficient Computation of LALR(1) Look-Ahead Sets |url=http://3e8.org/pub/scheme/doc/parsing/Efficient%20Computation%20of%20LALR%281%29%20Look-Ahead%20Sets.pdf |journal=ACM Transactions on Programming Languages and Systems |volume=4 |issue=4 |pages=615–649|doi=10.1145/69622.357187}}</ref> LALR parsers can be automatically generated from a grammar by an [[LALR parser generator]] such as [[Yacc]] or [[GNU Bison]]. The automatically generated code may be augmented by hand-written code to augment the power of the resulting parser.

== History ==

In 1965, [[Donald Knuth]] invented the [[LR parser]] ('''L'''eft to Right, [[rightmost derivation|'''R'''ightmost derivation]]). The LR parser can recognize any [[deterministic context-free language]] in linear-bounded time.<ref>{{Cite journal | last1 = Knuth | first1 = D. E. | author-link = Donald Knuth | title = On the translation of languages from left to right | doi = 10.1016/S0019-9958(65)90426-2 | journal = Information and Control | volume = 8 | issue = 6 | pages = 607–639 | date = July 1965 | doi-access = free }}</ref> Rightmost derivation has very large memory requirements and implementing an LR parser was impractical due to the limited [[computer memory|memory]] of computers at that time. To address this shortcoming, in 1969, Frank DeRemer proposed two simplified versions of the LR parser, namely the '''Look-Ahead LR''' (LALR){{sfn|DeRemer|1969}} and the '''[[SLR parser|Simple LR parser]]''' (SLR) that had much lower memory requirements at the cost of less language-recognition power, with the LALR parser being the most-powerful alternative.{{sfn|DeRemer|1969}} In 1977, memory optimizations for the LR parser were invented<ref>{{citation|title=A Practical General Method for Constructing LR(k) Parsers|volume=7|issue=3|author=Pager, D.|work=Acta Informatica 7|pages=249–268|year=1977|doi=10.1007/BF00290336}}</ref> but still the LR parser was less memory-efficient than the simplified alternatives.

In 1979, Frank DeRemer and [[Tom Pennello]] announced a series of optimizations for the LALR parser that would further improve its memory efficiency.<ref>{{citation|title=Efficient Computation of LALR(1) Look-Ahead Sets|author=Frank DeRemer, Thomas Pennello|work=Sigplan Notices - SIGPLAN, vol. 14, no. 8|pages=176–187|year=1979}}</ref> Their work was published in 1982.<ref name="DeRemer82"/>

== Overview ==
Generally, the LALR parser refers to the LALR(1) parser,{{efn|"LALR(1)" is pronounced as the [[initialism]] "el-ay-el-arr-one"}} just as the LR parser generally refers to the LR(1) parser. The "(1)" denotes one-token lookahead, to resolve differences between rule patterns during parsing. Similarly, there is an LALR(2) parser with two-token lookahead, and LALR(''k'') parsers with ''k''-token lookup, but these are rare in actual use. The LALR parser is based on the LR(0) parser, so it can also be denoted LALR(1)&nbsp;=&nbsp;LA(1)LR(0) (1 token of lookahead, LR(0)) or more generally LALR(''k'')&nbsp;=&nbsp;LA(''k'')LR(0) (k tokens of lookahead, LR(0)). There is in fact a two-parameter family of LA(''k'')LR(''j'') parsers for all combinations of ''j'' and ''k'', which can be derived from the LR(''j''&nbsp;+&nbsp;''k'') parser,<ref>''Parsing Techniques: A Practical Guide,'' by Dick Grune and Ceriel J. H. Jacobs, "9.7 LALR(1)", [https://books.google.com/books?id=05xA_d5dSwAC&pg=PA302 p. 302]</ref> but these do not see practical use.

As with other types of LR parsers, an LALR parser is quite efficient at finding the single correct [[bottom-up parsing|bottom-up parse]] in a single left-to-right scan over the input stream, because it does not need to use [[backtracking]]. Being a lookahead parser by definition, it always uses a lookahead, with {{nowrap|LALR(1)}} being the most-common case.

== Relation to other parsers ==

=== LR parsers ===
The LALR(1) parser is less powerful than the LR(1) parser, and more powerful than the SLR(1) parser, though they all use the same [[Formal grammar#The syntax of grammars|production rules]]. The simplification that the LALR parser introduces consists in merging rules that have identical '''kernel item sets''', because during the LR(0) state-construction process the lookaheads are not known. This reduces the power of the parser because not knowing the lookahead symbols can confuse the parser as to which grammar rule to pick next, resulting in '''reduce/reduce conflicts'''. All conflicts that arise in applying a LALR(1) parser to an unambiguous LR(1) grammar are reduce/reduce conflicts. The SLR(1) parser performs further merging, which introduces additional conflicts.

The standard example of an LR(1) grammar that cannot be parsed with the LALR(1) parser, exhibiting such a reduce/reduce conflict, is:<ref>"[http://www.cse.ohio-state.edu/~gurari/course/cse756/cse756su56.xht 7.9 LR(1) but not LALR(1)] {{webarchive|url=https://web.archive.org/web/20100804231352/http://www.cse.ohio-state.edu/~gurari/course/cse756/cse756su56.xht |date=4 August 2010 }}", ''[http://www.cse.ohio-state.edu/~gurari/course/cse756/cse756.xht CSE 756: Compiler Design and Implementation] {{webarchive|url=https://web.archive.org/web/20100723153301/http://www.cse.ohio-state.edu/~gurari/course/cse756/cse756.xht |date=23 July 2010 }},'' Eitan Gurari, Spring 2008</ref><ref>"[https://stackoverflow.com/questions/8496065/why-is-this-lr1-grammar-not-lalr1 Why is this LR(1) grammar not LALR(1)?]"</ref>
   S → a E c
     → a F d
     → b F c
     → b E d
   E → e
   F → e

In the LALR table construction, two states will be merged into one state and later the lookaheads will be found to be ambiguous. The one state with lookaheads is:

   E → e. {c,d}
   F → e. {c,d}

An LR(1) parser will create two different states (with non-conflicting lookaheads), neither of which is ambiguous. In an LALR parser this one state has conflicting actions (given lookahead c or d, reduce to E or F), a "reduce/reduce conflict"; the above grammar will be declared ambiguous by a [[LALR parser generator]] and conflicts will be reported.

To recover, this ambiguity is resolved by choosing E, because it occurs before F in the grammar. However, the resultant parser will not be able to recognize the valid input sequence <code>b e c</code>, since the ambiguous sequence <code>e c</code> is reduced to <code>(E → e) c</code>, rather than the correct <code>(F → e) c</code>, but <code>b E c</code> is not in the grammar.

=== LL parsers ===
The LALR(''j'') parsers are incomparable with [[LL parser|LL(''k'') parsers]]: for any ''j'' and ''k'' both greater than 0, there are LALR(''j'') grammars that are not [[LL grammar|LL(''k'') grammars]] and vice versa. In fact, it is undecidable whether a given LL(1) grammar is LALR(''k'') for any <math>k > 0</math>.<ref name=chapman/>

Depending on the presence of empty derivations, a LL(1) grammar can be equal to a SLR(1) or a LALR(1) grammar. If the LL(1) grammar has no empty derivations it is SLR(1) and if all symbols with empty derivations have non-empty derivations it is LALR(1). If symbols having only an empty derivation exist, the grammar may or may not be LALR(1).<ref>{{harv|Beatty|1982}}</ref>

== See also ==
* [[Comparison of parser generators]]
* [[Context-free grammar]]
* [[Lookahead (parsing)|Lookahead]]
* [[Parser generator]]
* [[Token scanner]]

== Notes ==
{{notelist}}

== References ==
{{Reflist}}
{{refbegin}}
* {{cite thesis
|type=PhD
|title=Practical Translators for LR(k) languages
|url=http://publications.csail.mit.edu/lcs/pubs/pdf/MIT-LCS-TR-065.pdf
|first=Franklin L.
|last=DeRemer
|publisher=MIT
|year=1969
|access-date=13 November 2012
|archive-url=https://web.archive.org/web/20130819012838/http://publications.csail.mit.edu/lcs/pubs/pdf/MIT-LCS-TR-065.pdf
|archive-date=19 August 2013
|url-status=dead
}}
* {{Cite journal | last1 = Beatty | first1 = J. C. | title = On the relationship between LL(1) and LR(1) grammars | url = https://cs.uwaterloo.ca/research/tr/1979/CS-79-36.pdf | doi = 10.1145/322344.322350 | journal = Journal of the ACM | volume = 29 | issue = 4 (Oct) | pages = 1007–1022 | year = 1982 }}
{{refend}}

== External links ==
* [http://www.supereasyfree.com/software/simulators/compilers/principles-techniques-and-tools/parsing-simulator/parsing-simulator.php Parsing Simulator] This simulator is used to generate parsing tables LALR and resolve the exercises of the book.
* [http://jscc.brobston.com/ JS/CC] JavaScript based implementation of a LALR(1) parser generator, which can be run in a web-browser or from the command-line.
* {{webarchive |url=https://web.archive.org/web/20210507215636/http://web.cs.dal.ca:80/~sjackson/lalr1.html |date=May 7, 2021 |title=LALR(1) tutorial}}, a flash card-like tutorial on LALR(1) parsing.

{{Parsers}}

[[Category:Parsing algorithms]]