Editing LR parser (section)

=== Parse table for the example grammar ===

Most LR parsers are table driven. The parser's program code is a simple generic loop that is the same for all grammars and languages. The knowledge of the grammar and its syntactic implications are encoded into unchanging data tables called '''parse tables''' (or '''parsing tables'''). Entries in a table show whether to shift or reduce (and by which grammar rule), for every legal combination of parser state and lookahead symbol.  The parse tables also tell how to compute the next state, given just a current state and a next symbol.

The parse tables are much larger than the grammar.  LR tables are hard to accurately compute by hand for big grammars.  So they are mechanically derived from the grammar by some [[parser generator]] tool like [[GNU Bison|Bison]].<ref>Flex & Bison: Text Processing Tools, by John Levine, O'Reilly Media 2009.</ref>

Depending on how the states and parsing table are generated, the resulting parser is called either a [[simple LR parser|'''SLR''' (simple LR) parser]], [[LALR parser|'''LALR''' (look-ahead LR) parser]], or [[canonical LR parser]]. LALR parsers handle more grammars than SLR parsers. Canonical LR parsers handle even more grammars, but use many more states and much larger tables.  The example grammar is SLR.

LR parse tables are two-dimensional.  Each current LR(0) parser state has its own row.  Each possible next symbol has its own column.  Some combinations of state and next symbol are not possible for valid input streams.  These blank cells trigger syntax error messages.

The '''Action''' left half of the table has columns for lookahead terminal symbols.  These cells determine whether the next parser action is shift (to state ''n''), or reduce (by grammar rule '''r'''<sub>''n''</sub>).

The '''Goto''' right half of the table has columns for nonterminal symbols.  These cells show which state to advance to, after some reduction's Left Hand Side has created an expected new instance of that symbol.  This is like a shift action but for nonterminals; the lookahead terminal symbol is unchanged.

The table column "Current Rules" documents the meaning and syntax possibilities for each state, as worked out by the parser generator.  It is not included in the actual tables used at parsing time.  The <big>{{color|#f7f|•}}</big> (pink dot) marker shows where the parser is now, within some partially recognized grammar rules.  The things to the left of <big>{{color|#f7f|•}}</big> have been parsed, and the things to the right are expected soon.  A state has several such current rules if the parser has not yet narrowed possibilities down to a single rule.

{| class="wikitable"
|-
! Curr !!  !! colspan="5" | Lookahead !! !! colspan="3" | LHS Goto
|-
! State !! Current Rules !! ''int'' !! ''id'' !! *  &nbsp; !! + &nbsp; !! ''eof'' !! !! Sums !! Products !! Value
|-
| 0 || Goal → <big>{{color|#f7f|•}}</big> Sums ''eof'' || 8 || 9 ||  ||  ||  |||| 1 || 4 || 7 
|-
| 1 || Goal → Sums <big>{{color|#f7f|•}}</big> ''eof'' {{br}} Sums → Sums <big>{{color|#f7f|•}}</big> + Products ||  ||  ||  || {{br}}2 || accept{{br}}&nbsp; ||||  ||  ||  
|-
| 2 || Sums → Sums + <big>{{color|#f7f|•}}</big> Products || 8 || 9 ||  ||  ||  ||||  || 3 || 7 
|-
| 3 || Sums → Sums + Products <big>{{color|#f7f|•}}</big> {{br}} Products → Products <big>{{color|#f7f|•}}</big> * Value ||  ||  || {{br}}5 || r1 {{br}}&nbsp; || r1 {{br}}&nbsp; ||||  ||  ||  
|-
| 4 || Sums → Products <big>{{color|#f7f|•}}</big> {{br}} Products → Products <big>{{color|#f7f|•}}</big> * Value ||  ||  || {{br}}5 || r2 {{br}}&nbsp; || r2 {{br}}&nbsp; ||||  ||  ||  
|-
| 5 || Products → Products * <big>{{color|#f7f|•}}</big> Value || 8 || 9 ||  ||  ||  ||||  ||  || 6 
|-
| 6 || Products → Products * Value <big>{{color|#f7f|•}}</big> ||  ||  || r3 || r3 || r3 ||||  ||  ||  
|-
| 7 || Products → Value <big>{{color|#f7f|•}}</big> ||  ||  || r4 || r4 || r4 ||||  ||  ||  
|-
| 8 || Value → ''int'' <big>{{color|#f7f|•}}</big> ||  ||  || r5 || r5 || r5 ||||  ||  ||  
|-
| 9 || Value → ''id'' <big>{{color|#f7f|•}}</big> ||  ||  || r6 || r6 || r6 ||||  ||  ||  
|-
|}

In state 2 above, the parser has just found and shifted-in the '''+''' of grammar rule
::r1: Sums → Sums + <big>{{color|#f7f|•}}</big> Products
The next expected phrase is Products.  Products begins with terminal symbols ''int'' or ''id''.  If the lookahead is either of those, the parser shifts them in and advances to state 8 or 9, respectively.  When a Products has been found, the parser advances to state 3 to accumulate the complete list of summands and find the end of rule r0.  A Products can also begin with nonterminal  Value.  For any other lookahead or nonterminal, the parser announces a syntax error.

{{rule}}

In state 3, the parser has just found a Products phrase, that could be from two possible grammar rules:
::r1: Sums → Sums + Products <big>{{color|#f7f|•}}</big>
::r3: Products → Products <big>{{color|#f7f|•}}</big> * Value
The choice between r1 and r3 can't be decided just from looking backwards at prior phrases.  The parser has to check the lookahead symbol to tell what to do.  If the lookahead is '''*''', it is in rule 3, so the parser shifts in the '''*''' and advances to state 5.  If the lookahead is ''eof'', it is at the end of rule 1 and rule 0, so the parser is done.

{{rule}}

In state 9 above, all the non-blank, non-error cells are for the same reduction r6.  Some parsers save time and table space by not checking the lookahead symbol in these simple cases.  Syntax errors are then detected somewhat later, after some harmless reductions, but still before the next shift action or parser decision.

Individual table cells must not hold multiple, alternative actions, otherwise the parser would be nondeterministic with guesswork and backtracking.  If the grammar is not LR(1), some cells will have shift/reduce conflicts between a possible shift action and reduce action, or reduce/reduce conflicts between multiple grammar rules.  LR(''k'') parsers resolve these conflicts (where possible) by checking additional lookahead symbols beyond the first.