Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Parsing expression grammar
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
=== Compared to regular expressions === Compared to pure [[regular expressions]] (i.e., describing a language recognisable using a [[finite automaton]]), PEGs are vastly more powerful. In particular they can handle unbounded recursion, and so match parentheses down to an arbitrary nesting depth; regular expressions can at best keep track of nesting down to some fixed depth, because a finite automaton (having a finite set of internal states) can only distinguish finitely many different nesting depths. In more theoretical terms, <math> \{a^n b^n\}_{n \geqslant 0} </math> (the language of all strings of zero or more <math>a</math>'s, followed by an ''equal number'' of <math>b</math>s) is not a regular language, but it is easily seen to be a parsing expression language, matched by the grammar <syntaxhighlight lang="peg"> start ← AB !. AB ← ('a' AB 'b')? </syntaxhighlight> Here <code>AB !.</code> is the starting expression. The <code>!.</code> part enforces that the input ends after the <code>AB</code>, by saying “there is no next character”; unlike regular expressions, which have magic constraints <code>$</code> or <code>\Z</code> for this, parsing expressions can express the end of input using only the basic primitives. The <code>*</code>, <code>+</code>, and <code>?</code> of parsing expressions are similar to those in regular expressions, but a difference is that these operate strictly in a greedy mode. This is ultimately due to <code>/</code> being an ordered choice. A consequence is that something can match as a regular expression which does not match as parsing expression: : <code>[ab]?[bc][cd]</code> is both a valid regular expression and a valid parsing expression. As regular expression, it matches <code>bc</code>, but as parsing expression it does not match, because the <code>[ab]?</code> will match the <code>b</code>, then <code>[bc]</code> will match the <code>c</code>, leaving nothing for the <code>[cd]</code>, so at that point matching the sequence fails. "Trying again" with having <code>[ab]?</code> match the empty string is explicitly against the semantics of parsing expressions; this is not an edge case of a particular matching algorithm, instead it is the sought behaviour. Even regular expressions that depend on nondeterminism ''can'' be compiled into a parsing expression grammar, by having a separate nonterminal for every state of the corresponding [[deterministic finite automaton|DFA]] and encoding its transition function into the definitions of these nonterminals — <syntaxhighlight lang="peg"> A ← 'x' B / 'y' C </syntaxhighlight> is effectively saying "from state A transition to state B if the next character is x, but to state C if the next character is y" — but this works because nondeterminism can be eliminated within the realm of regular languages. It would not make use of the parsing expression variants of the repetition operations.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)