Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Packrat parser
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Syntax == {{See also|Parsing expression grammar#Syntax}} The packrat parser takes in input the same syntax as PEGs: a simple PEG is composed of terminal and nonterminal symbols, possibly interleaved with operators that compose one or several derivation rules.<ref name=":1" /> === Symbols === * Nonterminal symbols are indicated with capital letters (e.g., <math>\{S, E, F, D\}</math>) * Terminal symbols are indicated with lowercase (e.g., <math>\{a,b,z,e,g \}</math>) * Expressions are indicated with lower-case Greek letter (e.g., <math>\{\alpha,\beta,\gamma,\omega,\tau\}</math>) ** Expressions can be a mix of terminal symbols, nonterminal symbols and operators === Operators === {| class="wikitable" |+Syntax Rules !Operator !Semantics |- |Sequence <math>\alpha\beta</math> |'''Success:''' If <math>\alpha</math> and <math>\beta</math> are recognized '''Failure:''' If <math>\alpha</math> or <math>\beta</math> are not recognized '''Consumed:''' <math>\alpha</math> and <math>\beta</math> in case of success |- |Ordered choice <math>\alpha/\beta/\gamma</math> |'''Success:''' If any of <math>\{\alpha,\beta,\gamma\}</math> is recognized starting from the left '''Failure:''' All of <math>\{\alpha,\beta,\gamma\}</math> do not match '''Consumed:''' The atomic expression that has generated a success so if multiple succeed the first one is always returned |- |And predicate <math>\&\alpha</math> |'''Success:''' If <math>\alpha</math> is recognized '''Failure:''' If <math>\alpha</math> is not recognized '''Consumed:''' No input is consumed |- |Not predicate <math>!\alpha</math> |'''Success:''' If <math>\alpha</math> is not recognized '''Failure:''' If <math>\alpha</math> is recognized '''Consumed:''' No input is consumed |- |One or more <math>\alpha +</math> |'''Success:''' Try to recognize <math>\alpha</math> one or multiple time '''Failure:''' If <math>\alpha</math> is not recognized '''Consumed:''' The maximum number that <math>\alpha </math> is recognized |- |Zero or more <math>\alpha *</math> |'''Success:''' Try to recognize <math>\alpha</math> zero or multiple time '''Failure:''' Cannot fail '''Consumed:''' The maximum number that <math>\alpha </math> is recognized |- |Zero or one <math>\alpha ?</math> |'''Success:''' Try to recognize <math>\alpha</math> zero or once '''Failure:''' Cannot fail '''Consumed:''' <math>\alpha</math> if it is recognized |- |Terminal range [<math>a-b</math>] |'''Success:''' Recognize any terminal <math>c</math> that are inside the range <math>[a-b]</math>. In the case of <math> [\textbf{'} h \textbf{'} - \textbf{'} z \textbf{'}] </math>, <math>c</math> can be any letter from h to z '''Failure:''' If no terminal inside of <math>[a-b]</math> can be recognized '''Consumed:''' <math>c</math> if it is recognized |- |Any character <math> . </math> |'''Success:''' Recognize any character in the input '''Failure:''' If no character in the input '''Consumed:''' any character in the input |} === Rules === A derivation rule is composed by a nonterminal symbol and an expression <math>S \rightarrow \alpha</math>. A special expression <math>\alpha_s</math> is the starting point of the grammar.<ref name=":1" /> In case no <math>\alpha_s</math> is specified, the first expression of the first rule is used. An input string is considered accepted by the parser if the <math> \alpha_s </math> is recognized. As a side-effect, a string <math> x </math> can be recognized by the parser even if it was not fully consumed.<ref name=":1" /> An extreme case of this rule is that the grammar <math> S \rightarrow x* </math> matches any string. This can be avoided by rewriting the grammar as <math> S \rightarrow x*!. </math> === Example === <math>\begin{cases} S \rightarrow A/B/D \\ A \rightarrow \texttt{'a'}\ S \ \texttt{'a'} \\ B \rightarrow \texttt{'b'}\ S \ \texttt{'b'} \\ D \rightarrow (\texttt{'0'}-\texttt{'9'})? \end{cases}</math> This grammar recognizes a [[palindrome]] over the alphabet <math> \{ a,b \} </math>, with an optional digit in the middle. Example strings accepted by the grammar include: <math> \texttt{'aa'} </math> and <math> \texttt{'aba3aba'} </math>. === Left recursion === Left recursion happens when a grammar production refers to itself as its left-most element, either directly or indirectly. Since Packrat is a recursive descent parser, it cannot handle left recursion directly.<ref name=":2">{{Cite book |last1=Warth |first1=Alessandro |last2=Douglass |first2=James R. |last3=Millstein |first3=Todd |title=Proceedings of the 2008 ACM SIGPLAN symposium on Partial evaluation and semantics-based program manipulation |chapter=Packrat parsers can support left recursion |date=2008-01-07 |chapter-url=https://doi.org/10.1145/1328408.1328424 |series=PEPM '08 |location=New York, NY, USA |publisher=Association for Computing Machinery |pages=103β110 |doi=10.1145/1328408.1328424 |isbn=978-1-59593-977-7|s2cid=2168153 }}</ref> During the early stages of development, it was found that a production that is left-recursive can be transformed into a right-recursive production.<ref>{{Cite book |title=Compilers: principles, techniques, & tools |date=2007 |publisher=Pearson Addison-Wesley |isbn=978-0-321-48681-3 |editor-last=Aho |editor-first=Alfred V. |edition=2nd |location=Boston Munich |editor-last2=Lam |editor-first2=Monica S. |editor-last3=Sethi |editor-first3=Ravi |editor-last4=Ullman |editor-first4=Jeffrey D.}}</ref> This modification significantly simplifies the task of a Packrat parser. Nonetheless, if there is an indirect left recursion involved, the process of rewriting can be quite complex and challenging. If the time complexity requirements are loosened from linear to [[Complexity class|superlinear]], it is possible to modify the memoization table of a Packrat parser to permit left recursion, without altering the input grammar.<ref name=":2" /> === Iterative combinator === The iterative combinators <math>\alpha +</math> and <math>\alpha *</math> need special attention when used in a Packrat parser: these combinators introduce a ''secret'' recursion that does not record intermediate results in the outcome matrix, which can lead to the parser operating with a superlinear behaviour. This problem can be resolved by applying the following transformation:<ref name=":3" /> {| class="wikitable" |+ !Original !Translated |- |<math>S \rightarrow \alpha +</math> |<math>S \rightarrow \alpha S / \alpha </math> |- |<math>S \rightarrow \alpha *</math> |<math>S \rightarrow \alpha S / \epsilon</math> |} With this transformation, the intermediate results can be properly memoized.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)