Editing LR parser (section)

=== Lookahead sets ===

The states and transitions give all the needed information for the parse table's shift actions and goto actions.  The generator also needs to calculate the expected lookahead sets for each reduce action.

In '''SLR''' parsers, these lookahead sets are determined directly from the grammar, without considering the individual states and transitions.  For each nonterminal S, the SLR generator works out Follows(S), the set of all the terminal symbols which can immediately follow some occurrence of S.  In the parse table, each reduction to S uses Follow(S) as its LR(1) lookahead set.  Such follow sets are also used by generators for LL top-down parsers.  A grammar that has no shift/reduce or reduce/reduce conflicts when using Follow sets is called an SLR grammar.

'''LALR''' parsers have the same states as SLR parsers, but use a more complicated, more precise way of working out the minimum necessary reduction lookaheads for each individual state.  Depending on the details of the grammar, this may turn out to be the same as the Follow set computed by SLR parser generators, or it may turn out to be a subset of the SLR lookaheads.  Some grammars are okay for LALR parser generators but not for SLR parser generators.  This happens when the grammar has spurious shift/reduce or reduce/reduce conflicts using Follow sets, but no conflicts when using the exact sets computed by the LALR generator.  The grammar is then called LALR(1) but not SLR.

An SLR or LALR parser avoids having duplicate states.  But this minimization is not necessary, and can sometimes create unnecessary lookahead conflicts.  '''Canonical LR''' parsers use duplicated (or "split") states to better remember the left and right context of a nonterminal's use. Each occurrence of a symbol S in the grammar can be treated independently with its own lookahead set, to help resolve reduction conflicts.  This handles a few more grammars.  Unfortunately, this greatly magnifies the size of the parse tables if done for all parts of the grammar.  This splitting of states can also be done manually and selectively with any SLR or LALR parser, by making two or more named copies of some nonterminals.  A grammar that is conflict-free for a canonical LR generator but has conflicts in an LALR generator is called LR(1) but not LALR(1), and not SLR.

SLR, LALR, and canonical LR parsers make exactly the same shift and reduce decisions when the input stream is the correct language.  When the input has a syntax error, the LALR parser may do some additional (harmless) reductions before detecting the error than would the canonical LR parser.  And the SLR parser may do even more.  This happens because the SLR and LALR parsers are using a generous superset approximation to the true, minimal lookahead symbols for that particular state.