Editing LR parser (section)

== Table construction ==

=== Finding the reachable item sets and the transitions between them ===

The first step of constructing the tables consists of determining the transitions between the closed item sets. These transitions will be determined as if we are considering a finite automaton that can read terminals as well as nonterminals. The begin state of this automaton is always the closure of the first item of the added rule: S → <big>{{color|#f7f|•}}</big> E eof:
: '''Item set 0'''
: S → <big>{{color|#f7f|•}}</big> E eof
: '''+''' E → <big>{{color|#f7f|•}}</big> E * B
: '''+''' E → <big>{{color|#f7f|•}}</big> E + B
: '''+''' E → <big>{{color|#f7f|•}}</big> B
: '''+''' B → <big>{{color|#f7f|•}}</big> 0
: '''+''' B → <big>{{color|#f7f|•}}</big> 1

The [[boldface]]d "'''+'''" in front of an item indicates the items that were added for the closure (not to be confused with the mathematical '+' operator which is a terminal). The original items without a "'''+'''" are called the ''kernel'' of the item set.

Starting at the begin state (S0), all of the states that can be reached from this state are now determined. The possible transitions for an item set can be found by looking at the symbols (terminals and nonterminals) found following the dots; in the case of item set 0 those symbols are the terminals '0' and '1' and the nonterminals E and B. To find the item set that each symbol <math display="inline">x \in \{0, 1, E, B\}</math> leads to,  the following procedure is followed for each of the symbols:
# Take the subset, ''S'', of all items in the current item set where there is a dot in front of the symbol of interest, ''x''.
# For each item in ''S'', move the dot to the right of ''x''.
# Close the resulting set of items.

For the terminal '0' (i.e. where x = '0') this results in:
: '''Item set 1'''
: B → 0 <big>{{color|#f7f|•}}</big>
and for the terminal '1' (i.e. where x = '1') this results in:
: '''Item set 2'''
: B → 1 <big>{{color|#f7f|•}}</big>
and for the nonterminal E (i.e. where x = E) this results in:
: '''Item set 3'''
: S → E <big>{{color|#f7f|•}}</big> eof
: E → E <big>{{color|#f7f|•}}</big> * B
: E → E <big>{{color|#f7f|•}}</big> + B
and for the nonterminal B (i.e. where x = B) this results in:
: '''Item set 4'''
: E → B <big>{{color|#f7f|•}}</big>

The closure does not add new items in all cases - in the new sets above, for example, there are no nonterminals following the dot.

Above procedure is continued until no more new item sets are found. For the item sets 1, 2, and 4 there will be no transitions since the dot is not in front of any symbol. For item set 3 though, we have dots in front of terminals '*' and '+'. For symbol <math display="inline">x = \texttt{*}</math> the transition goes to:
: '''Item set 5'''
: E → E * <big>{{color|#f7f|•}}</big> B
: '''+''' B → <big>{{color|#f7f|•}}</big> 0
: '''+''' B → <big>{{color|#f7f|•}}</big> 1
and for  <math display="inline">x = \texttt{+}</math> the transition goes to:
: '''Item set 6'''
: E → E + <big>{{color|#f7f|•}}</big> B
: '''+''' B → <big>{{color|#f7f|•}}</big> 0
: '''+''' B → <big>{{color|#f7f|•}}</big> 1

Now, the third iteration begins.

For item set 5, the terminals '0' and '1' and the nonterminal B must be considered, but the resulting closed item sets for the terminals are equal to already found item sets 1 and 2, respectively. For the nonterminal B, the transition goes to:
: '''Item set 7'''
: E → E * B <big>{{color|#f7f|•}}</big>

For item set 6, the terminal '0' and '1' and the nonterminal B must be considered, but as before, the resulting item sets for the terminals are equal to the already found item sets 1 and 2. For the nonterminal B the transition goes to:
: '''Item set 8'''
: E → E + B <big>{{color|#f7f|•}}</big>

These final item sets 7 and 8 have no symbols beyond their dots so no more new item sets are added, so the item generating procedure is complete. The finite automaton, with item sets as its states is shown below.

The transition table for the automaton now looks as follows:

{| class="wikitable" style="text-align:center"
|- style="background:#e0e0d0;"
!  style="background:#d0e0ff; width:28%;"|Item Set
! style="width:12%;"|*
! style="width:12%;"|+
! style="width:12%;"|0
! style="width:12%;"|1
! style="width:12%;"|E
! style="width:12%;"|B
|-
!0 
|&nbsp; ||&nbsp; ||1      ||2      ||3      ||4
|-
!1 
|&nbsp; ||&nbsp; ||&nbsp; ||&nbsp; ||&nbsp; ||&nbsp;
|-
!2 
|&nbsp; ||&nbsp; ||&nbsp; ||&nbsp; ||&nbsp; ||&nbsp;
|-
!3 
|5      ||6      ||&nbsp; ||&nbsp; ||&nbsp; ||&nbsp;
|-
!4 
|&nbsp; ||&nbsp; ||&nbsp; ||&nbsp; ||&nbsp; ||&nbsp;
|-
!5 
|&nbsp; ||&nbsp; ||1      ||2      ||&nbsp; ||7
|-
!6 
|&nbsp; ||&nbsp; ||1      ||2      ||&nbsp; ||8
|-
!7 
|&nbsp; ||&nbsp; ||&nbsp; ||&nbsp; ||&nbsp; ||&nbsp;
|-
!8 
|&nbsp; ||&nbsp; ||&nbsp; ||&nbsp; ||&nbsp; ||&nbsp;
|}

=== Constructing the action and goto tables ===

From this table and the found item sets, the action and goto table are constructed as follows:
<!-- Jezhotwells noted that sentences like the above "are not encyclopaedic, reads like an instruction manual" --> 
# The columns for nonterminals are copied to the goto table.
# The columns for the terminals are copied to the action table as shift actions.
# An extra column for '$' (end of input) is added to the action table. An ''acc'' action is added to the '$' column for each item set that contains an item of the form S → w <big>{{color|#f7f|•}}</big> eof.
# If an item set ''i'' contains an item of the form ''A'' → ''w'' <big>{{color|#f7f|•}}</big> and ''A'' → ''w'' is rule ''m'' with ''m'' &gt; 0 then the row for state ''i'' in the action table is completely filled with the reduce action r''m''.
The reader may verify that these steps produce the action and goto table presented earlier.

==== A note about LR(0) versus SLR and LALR parsing ====

Only step 4 of the above procedure produces reduce actions, and so all reduce actions must occupy an entire table row, causing the reduction to occur regardless of the next symbol in the input stream.  This is why these are LR(0) parse tables: they don't do any lookahead (that is, they look ahead zero symbols) before deciding which reduction to perform.  A grammar that needs lookahead to disambiguate reductions would require a parse table row containing different reduce actions in different columns, and the above procedure is not capable of creating such rows.
<!-- Jezhotwells noted "Paragraphs such as this should be rewritten in good plain english, at the moment this is gobbledegook to the average reader." -->

Refinements to the '''LR'''(0) table construction procedure (such as [[Simple LR parser|SLR]] and [[LALR parser|LALR]]) are capable of constructing reduce actions that do not occupy entire rows.  Therefore, they are capable of parsing more grammars than LR(0) parsers.

=== Conflicts in the constructed tables ===

The automaton is constructed in such a way that it is guaranteed to be deterministic. However, when reduce actions are added to the action table it can happen that the same cell is filled with a reduce action and a shift action (a ''shift-reduce conflict'') or with two different reduce actions (a ''reduce-reduce conflict''). However, it  can be shown that when this happens the grammar is not an LR(0) grammar. A classic real-world example of a shift-reduce conflict is the [[dangling else]] problem.

A small example of a non-LR(0) grammar with a shift-reduce conflict is:
: (1) E → 1 E
: (2) E → 1

One of the item sets found is:
: '''Item set 1'''
: E → 1 <big>{{color|#f7f|•}}</big> E
: E → 1 <big>{{color|#f7f|•}}</big>
: '''+''' E → <big>{{color|#f7f|•}}</big> 1 E
: '''+''' E → <big>{{color|#f7f|•}}</big> 1

There is a shift-reduce conflict in this item set: when constructing the action table according to the rules above,  the cell for  [item set 1, terminal '1'] contains '''s1''' (shift to state 1) '''and r2''' (reduce with grammar rule 2).

A small example of a non-LR(0) grammar with a reduce-reduce conflict is:
: (1) E → A 1
: (2) E → B 2
: (3) A → 1
: (4) B → 1

In this case the following item set is obtained:
: '''Item set 1'''
: A → 1 <big>{{color|#f7f|•}}</big>
: B → 1 <big>{{color|#f7f|•}}</big>

There is a reduce-reduce conflict in this item set because in the cells in the action table for this item set there will be both a reduce action for rule 3 and one for rule 4.

Both examples above can be solved by letting the parser use the follow set (see [[LL parser]]) of a nonterminal ''A'' to decide if it is going to use one of ''A''s rules for a reduction; it will only use the rule ''A'' → ''w'' for a reduction if the next symbol on the input stream is in the follow set of ''A''. This solution results in so-called [[Simple LR parser]]s.