Editing LL parser (section)

== Overview ==

For a given [[context-free grammar]], the parser attempts to find the [[Context-free grammar#Derivations and syntax trees|leftmost derivation]].
Given an example grammar ''G'':
# <math>S \to E</math>
# <math>E \to ( E + E )</math>
# <math>E \to i</math>
the leftmost derivation for <math>w = ((i+i)+i)</math> is:
: <math>S\ \overset{(1)}{\Rightarrow}\ E\ \overset{(2)}{\Rightarrow}\ (E+E)\ \overset{(2)}{\Rightarrow}\ ((E+E)+E)\ \overset{(3)}{\Rightarrow}\ ((i+E)+E)\ \overset{(3)}{\Rightarrow}\ ((i+i)+E)\ \overset{(3)}{\Rightarrow}\ ((i+i)+i)</math>

Generally, there are multiple possibilities when selecting a rule to expand the leftmost non-terminal. In step 2 of the previous example, the parser must choose whether to apply rule 2 or rule 3:
: <math>S\ \overset{(1)}{\Rightarrow}\ E\ \overset{(?)}{\Rightarrow}\ ?</math>

To be efficient, the parser must be able to make this choice deterministically when possible, without backtracking. For some grammars, it can do this by peeking on the unread input (without reading). In our example, if the parser knows that the next unread symbol is '''(''', the only correct rule that can be used is 2.

Generally, an LL(''k'') parser can look ahead at ''k'' symbols. However, given a grammar, the problem of determining if there exists a LL(''k'') parser for some ''k'' that recognizes it is undecidable. For each ''k'', there is a language that cannot be recognized by an LL(''k'') parser, but can be by an {{nowrap|LL(''k'' + 1)}}.

We can use the above analysis to give the following formal definition:

Let ''G'' be a context-free grammar and {{nowrap|''k'' ≥ 1}}. We say that ''G'' is LL(''k''), if and only if for any two leftmost derivations:
# <math>S\ \Rightarrow\ \cdots\ \Rightarrow\ wA\alpha\ \Rightarrow\ \cdots\ \Rightarrow\ w\beta\alpha\ \Rightarrow\ \cdots\ \Rightarrow\ wu</math>
# <math>S\ \Rightarrow\ \cdots\ \Rightarrow\ wA\alpha\ \Rightarrow\ \cdots\ \Rightarrow\ w\gamma\alpha\ \Rightarrow\ \cdots\ \Rightarrow\ wv</math>
the following condition holds: the prefix of the string <math>u</math> of length <math>k</math> equals the prefix of the string <math>v </math> of length ''k'' implies <math>\beta = \gamma</math>.

In this definition, <math>S</math> is the start symbol and <math>A</math> any non-terminal. The already derived input <math>w</math>, and yet unread <math>u</math> and <math>v</math> are strings of terminals. The Greek letters <math>\alpha</math>, <math>\beta</math> and <math>\gamma</math> represent any string of both terminals and non-terminals (possibly empty). The prefix length corresponds to the lookahead buffer size, and the definition says that this buffer is enough to distinguish between any two derivations of different words.