Editing LR parser (section)

=== Grammar for the example A * 2 + 1===

LR parsers are constructed from a grammar that formally defines the syntax of the input language as a set of patterns.  The grammar doesn't cover all language rules, such as the size of numbers, or the consistent use of names and their definitions in the context of the whole program.  LR parsers use a [[context-free grammar]] that deals just with local patterns of symbols.

The example grammar used here is a tiny subset of the Java or C language:

::r0: Goal → Sums ''eof''
::r1: Sums → Sums + Products
::r2: Sums → Products
::r3: Products → Products * Value
::r4: Products → Value
::r5: Value → ''int''
::r6: Value → ''id''

The grammar's [[terminal symbol]]s are the multi-character symbols or 'tokens' found in the input stream by a [[lexical analysis|lexical scanner]].  Here these include '''+''' '''*''' and ''int'' for any integer constant, and ''id'' for any identifier name, and ''eof'' for end of input file.  The grammar doesn't care what the ''int'' values or ''id'' spellings are, nor does it care about blanks or line breaks.  The grammar uses these terminal symbols but does not define them.  They are always leaf nodes (at the bottom bushy end) of the parse tree.

The capitalized terms like Sums are [[nonterminal symbol]]s.  These are names for concepts or patterns in the language. They are defined in the grammar and never occur themselves in the input stream.  They are always internal nodes (above the bottom) of the parse tree. They only happen as a result of the parser applying some grammar rule.  Some nonterminals are defined with two or more rules; these are alternative patterns.  Rules can refer back to themselves, which are called ''recursive''.  This grammar uses recursive rules to handle repeated math operators.  Grammars for complete languages use recursive rules to handle lists, parenthesized expressions, and nested statements.

Any given computer language can be described by several different grammars.  An LR(1) parser can handle many but not all common grammars.  It is usually possible to manually modify a grammar so that it fits the limitations of LR(1) parsing and the generator tool.

The grammar for an LR parser must be [[ambiguous grammar|unambiguous]] itself, or must be augmented by tie-breaking precedence rules.  This means there is only one correct way to apply the grammar to a given legal example of the language, resulting in a unique parse tree with just one meaning, and a unique sequence of shift/reduce actions for that example.  LR parsing is not a useful technique for human languages with ambiguous grammars that depend on the interplay of words.  Human languages are better handled by parsers like [[Generalized LR parser]], the [[Earley parser]], or the [[CYK algorithm]] that can simultaneously compute all possible parse trees in one pass.