Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Packrat parser
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
{{Short description|Type of parser}} {{Infobox algorithm |name={{PAGENAMEBASE}} |class=[[Parsing]] grammars that are [[Parsing expression grammar|PEG]] |data=[[String (computer science)|String]] |time=<math>O(n)</math> or <math>O(n^2)</math> without special handling of iterative combinator |best-time={{plainlist| * <math>O(n)</math> }} |average-time=<math>O(n)</math> |space=<math>O(n)</math> }} The '''Packrat parser''' is a type of [[Parsing|parser]] that shares similarities with the [[recursive descent parser]] in its construction. However, it differs because it takes [[parsing expression grammar|parsing expression grammars (PEGs)]] as input rather than [[LL grammar|LL grammars]].<ref name=":3">{{Cite arXiv |last=Ford |first=Bryan |date=2006 |title=Packrat Parsing: Simple, Powerful, Lazy, Linear Time |eprint=cs/0603077 }}</ref> In 1970, Alexander Birman laid the groundwork for packrat parsing by introducing the "TMG recognition scheme" (TS), and "generalized TS" (gTS). TS was based upon Robert M. McClure's [[TMG (language)|TMG]] compiler-compiler, and gTS was based upon Dewey Val Schorre's [[META II|META]] compiler-compiler. Birman's work was later refined by Aho and Ullman; and renamed as Top-Down Parsing Language (TDPL), and Generalized TDPL (GTDPL), respectively. These algorithms were the first of their kind to employ deterministic top-down parsing with backtracking.<ref name=":1">{{Cite book |last=Ford |first=Bryan |title=Proceedings of the 31st ACM SIGPLAN-SIGACT symposium on Principles of programming languages |chapter=Parsing expression grammars |date=2004-01-01 |chapter-url=https://doi.org/10.1145/964001.964011 |series=POPL '04 |location=New York, NY, USA |publisher=Association for Computing Machinery |pages=111β122 |doi=10.1145/964001.964011 |isbn=978-1-58113-729-3|s2cid=7762102 }}</ref><ref name=":0">{{Cite web |last=Flodin |first=Daniel |title=A Comparison Between Packrat Parsing and Conventional Shift-Reduce Parsing on Real-World Grammars and Inputs |url=http://uu.diva-portal.org/smash/get/diva2:752340/FULLTEXT01.pdf }}</ref> Bryan Ford developed PEGs as an expansion of GTDPL and TS. Unlike [[Context-free grammar|CFGs]], PEGs are unambiguous and can match well with machine-oriented languages. PEGs, similar to GTDPL and TS, can also express all [[LL parser|LL(k)]] and [[LR parser|LR(k)]]. Bryan also introduced Packrat as a parser that uses [[memoization]] techniques on top of a simple PEG parser. This was done because PEGs have an unlimited [[Parsing#Lookahead|lookahead]] capability resulting in a parser with [[exponential time]] performance in the worst case.<ref name=":1" /><ref name=":0" /> Packrat keeps track of the intermediate results for all mutually recursive parsing functions. Each parsing function is only called once at a specific input position. In some instances of packrat implementation, if there is insufficient memory, certain parsing functions may need to be called multiple times at the same input position, causing the parser to take longer than linear time.<ref>{{Cite book |last1=Mizushima |first1=Kota |last2=Maeda |first2=Atusi |last3=Yamaguchi |first3=Yoshinori |title=Proceedings of the 9th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering |chapter=Packrat parsers can handle practical grammars in mostly constant space |date=2010-05-06 |url=https://dl.acm.org/doi/10.1145/1806672.1806679 |language=en |publisher=ACM |pages=29β36 |doi=10.1145/1806672.1806679 |isbn=978-1-4503-0082-7|s2cid=14498865 }}</ref> == Syntax == {{See also|Parsing expression grammar#Syntax}} The packrat parser takes in input the same syntax as PEGs: a simple PEG is composed of terminal and nonterminal symbols, possibly interleaved with operators that compose one or several derivation rules.<ref name=":1" /> === Symbols === * Nonterminal symbols are indicated with capital letters (e.g., <math>\{S, E, F, D\}</math>) * Terminal symbols are indicated with lowercase (e.g., <math>\{a,b,z,e,g \}</math>) * Expressions are indicated with lower-case Greek letter (e.g., <math>\{\alpha,\beta,\gamma,\omega,\tau\}</math>) ** Expressions can be a mix of terminal symbols, nonterminal symbols and operators === Operators === {| class="wikitable" |+Syntax Rules !Operator !Semantics |- |Sequence <math>\alpha\beta</math> |'''Success:''' If <math>\alpha</math> and <math>\beta</math> are recognized '''Failure:''' If <math>\alpha</math> or <math>\beta</math> are not recognized '''Consumed:''' <math>\alpha</math> and <math>\beta</math> in case of success |- |Ordered choice <math>\alpha/\beta/\gamma</math> |'''Success:''' If any of <math>\{\alpha,\beta,\gamma\}</math> is recognized starting from the left '''Failure:''' All of <math>\{\alpha,\beta,\gamma\}</math> do not match '''Consumed:''' The atomic expression that has generated a success so if multiple succeed the first one is always returned |- |And predicate <math>\&\alpha</math> |'''Success:''' If <math>\alpha</math> is recognized '''Failure:''' If <math>\alpha</math> is not recognized '''Consumed:''' No input is consumed |- |Not predicate <math>!\alpha</math> |'''Success:''' If <math>\alpha</math> is not recognized '''Failure:''' If <math>\alpha</math> is recognized '''Consumed:''' No input is consumed |- |One or more <math>\alpha +</math> |'''Success:''' Try to recognize <math>\alpha</math> one or multiple time '''Failure:''' If <math>\alpha</math> is not recognized '''Consumed:''' The maximum number that <math>\alpha </math> is recognized |- |Zero or more <math>\alpha *</math> |'''Success:''' Try to recognize <math>\alpha</math> zero or multiple time '''Failure:''' Cannot fail '''Consumed:''' The maximum number that <math>\alpha </math> is recognized |- |Zero or one <math>\alpha ?</math> |'''Success:''' Try to recognize <math>\alpha</math> zero or once '''Failure:''' Cannot fail '''Consumed:''' <math>\alpha</math> if it is recognized |- |Terminal range [<math>a-b</math>] |'''Success:''' Recognize any terminal <math>c</math> that are inside the range <math>[a-b]</math>. In the case of <math> [\textbf{'} h \textbf{'} - \textbf{'} z \textbf{'}] </math>, <math>c</math> can be any letter from h to z '''Failure:''' If no terminal inside of <math>[a-b]</math> can be recognized '''Consumed:''' <math>c</math> if it is recognized |- |Any character <math> . </math> |'''Success:''' Recognize any character in the input '''Failure:''' If no character in the input '''Consumed:''' any character in the input |} === Rules === A derivation rule is composed by a nonterminal symbol and an expression <math>S \rightarrow \alpha</math>. A special expression <math>\alpha_s</math> is the starting point of the grammar.<ref name=":1" /> In case no <math>\alpha_s</math> is specified, the first expression of the first rule is used. An input string is considered accepted by the parser if the <math> \alpha_s </math> is recognized. As a side-effect, a string <math> x </math> can be recognized by the parser even if it was not fully consumed.<ref name=":1" /> An extreme case of this rule is that the grammar <math> S \rightarrow x* </math> matches any string. This can be avoided by rewriting the grammar as <math> S \rightarrow x*!. </math> === Example === <math>\begin{cases} S \rightarrow A/B/D \\ A \rightarrow \texttt{'a'}\ S \ \texttt{'a'} \\ B \rightarrow \texttt{'b'}\ S \ \texttt{'b'} \\ D \rightarrow (\texttt{'0'}-\texttt{'9'})? \end{cases}</math> This grammar recognizes a [[palindrome]] over the alphabet <math> \{ a,b \} </math>, with an optional digit in the middle. Example strings accepted by the grammar include: <math> \texttt{'aa'} </math> and <math> \texttt{'aba3aba'} </math>. === Left recursion === Left recursion happens when a grammar production refers to itself as its left-most element, either directly or indirectly. Since Packrat is a recursive descent parser, it cannot handle left recursion directly.<ref name=":2">{{Cite book |last1=Warth |first1=Alessandro |last2=Douglass |first2=James R. |last3=Millstein |first3=Todd |title=Proceedings of the 2008 ACM SIGPLAN symposium on Partial evaluation and semantics-based program manipulation |chapter=Packrat parsers can support left recursion |date=2008-01-07 |chapter-url=https://doi.org/10.1145/1328408.1328424 |series=PEPM '08 |location=New York, NY, USA |publisher=Association for Computing Machinery |pages=103β110 |doi=10.1145/1328408.1328424 |isbn=978-1-59593-977-7|s2cid=2168153 }}</ref> During the early stages of development, it was found that a production that is left-recursive can be transformed into a right-recursive production.<ref>{{Cite book |title=Compilers: principles, techniques, & tools |date=2007 |publisher=Pearson Addison-Wesley |isbn=978-0-321-48681-3 |editor-last=Aho |editor-first=Alfred V. |edition=2nd |location=Boston Munich |editor-last2=Lam |editor-first2=Monica S. |editor-last3=Sethi |editor-first3=Ravi |editor-last4=Ullman |editor-first4=Jeffrey D.}}</ref> This modification significantly simplifies the task of a Packrat parser. Nonetheless, if there is an indirect left recursion involved, the process of rewriting can be quite complex and challenging. If the time complexity requirements are loosened from linear to [[Complexity class|superlinear]], it is possible to modify the memoization table of a Packrat parser to permit left recursion, without altering the input grammar.<ref name=":2" /> === Iterative combinator === The iterative combinators <math>\alpha +</math> and <math>\alpha *</math> need special attention when used in a Packrat parser: these combinators introduce a ''secret'' recursion that does not record intermediate results in the outcome matrix, which can lead to the parser operating with a superlinear behaviour. This problem can be resolved by applying the following transformation:<ref name=":3" /> {| class="wikitable" |+ !Original !Translated |- |<math>S \rightarrow \alpha +</math> |<math>S \rightarrow \alpha S / \alpha </math> |- |<math>S \rightarrow \alpha *</math> |<math>S \rightarrow \alpha S / \epsilon</math> |} With this transformation, the intermediate results can be properly memoized. == Memoization technique == Memoization is an [[Program optimization|optimization]] technique in computing that aims to speed up programs by storing the results of expensive function calls. This technique essentially works by [[Cache (computing)|caching]] the results so that when the same inputs occur again, the cached result is simply returned, thus avoiding the time-consuming process of re-computing.<ref>{{Cite journal |last=Norvig |first=Peter |date=1991-03-01 |title=Techniques for automatic memoization with applications to context-free parsing |url=https://dl.acm.org/doi/abs/10.5555/971738.971743 |journal=Computational Linguistics |volume=17 |issue=1 |pages=91β98 |doi= |issn=0891-2017}}</ref> When using packrat parsing and memoization, it's noteworthy that the parsing function for each nonterminal is solely based on the input string. It does not depend on any information gathered during the parsing process. Essentially, memoization table entries do not affect or rely on the parser's specific state at any given time.<ref>{{Cite book |last1=Dubroy |first1=Patrick |last2=Warth |first2=Alessandro |title=Proceedings of the 10th ACM SIGPLAN International Conference on Software Language Engineering |chapter=Incremental packrat parsing |date=2017-10-23 |chapter-url=https://doi.org/10.1145/3136014.3136022 |series=SLE 2017 |location=New York, NY, USA |publisher=Association for Computing Machinery |pages=14β25 |doi=10.1145/3136014.3136022 |isbn=978-1-4503-5525-4|s2cid=13047585 }}</ref> Packrat parsing stores results in a matrix or similar data structure that allows for quick look-ups and insertions. When a production is encountered, the matrix is checked to see if it has already occurred. If it has, the result is retrieved from the matrix. If not, the production is evaluated, the result is inserted into the matrix, and then returned.<ref name=":4">{{Cite journal |last1=Science |first1=International Journal of Scientific Research in |last2=Ijsrset |first2=Engineering and Technology |title=A Survey of Packrat Parser |url=https://www.academia.edu/17779983 |journal=A Survey of Packrat Parser}}</ref> When evaluating the entire <math>m*n</math> matrix in a tabular approach, it would require <math>\Theta(mn)</math> space.<ref name=":4" /> Here, <math>m</math> represents the number of nonterminals, and <math>n</math> represents the input string size. In a naΓ―ve implementation, the entire table can be derived from the input string starting from the end of the string. The Packrat parser can be improved to update only the necessary cells in the matrix through a depth-first visit of each subexpression tree. Consequently, using a matrix with dimensions of <math>m*n</math> is often wasteful, as most entries would remain empty.<ref name=":2" /> These cells are linked to the input string, not to the nonterminals of the grammar. This means that increasing the input string size would always increase memory consumption, while the number of parsing rules changes only the worst space complexity.<ref name=":3" /> === Cut operator === Another operator called ''cut'' has been introduced to Packrat to reduce its average space complexity even further. This operator utilizes the formal structures of many programming languages to eliminate impossible derivations. For instance, control statements parsing in a standard programming language is mutually exclusive from the first recognized token, e.g.,<math>\{\mathtt{if, do, while, switch}\}</math>.<ref name=":5">{{Cite book |last1=Mizushima |first1=Kota |last2=Maeda |first2=Atusi |last3=Yamaguchi |first3=Yoshinori |title=Proceedings of the 9th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering |chapter=Packrat parsers can handle practical grammars in mostly constant space |date=2010-05-06 |chapter-url=https://doi.org/10.1145/1806672.1806679 |series=PASTE '10 |location=New York, NY, USA |publisher=Association for Computing Machinery |pages=29β36 |doi=10.1145/1806672.1806679 |isbn=978-1-4503-0082-7|s2cid=14498865 }}</ref> {| class="wikitable" !Operator !Semantics |- |Cut <math>\begin{array}{l} \alpha \uparrow \beta / \gamma \\ (\alpha \uparrow \beta)* \end{array} </math> |if <math>\alpha </math> is recognized but <math>\beta </math> is not, skip the evaluation of the alternative. In the first case don't evaluate <math>\gamma </math> if <math>\alpha </math> was recognized The second rule is can be rewritten as <math>N \rightarrow \alpha \uparrow \beta N / \epsilon </math> and the same rules can be applied. |} When a Packrat parser uses cut operators, it effectively clears its backtracking stack. This is because a cut operator reduces the number of possible alternatives in an ordered choice. By adding cut operators in the right places in a grammar's definition, the resulting Packrat parser only needs a nearly constant amount of space for memoization.<ref name=":5" /> == The algorithm == Sketch of an implementation of a Packrat algorithm in a Lua-like pseudocode.<ref name=":2" /> <syntaxhighlight lang="lua" line highlight="1,3,11" style="font-size:95%"> INPUT(n) -- return the character at position n RULE(R : Rule, P : Position ) entry = GET_MEMO(R,P) -- return the number of elements previously matched in rule R at position P if entry == nil then return EVAL(R, P); end return entry; EVAL(R : Rule, P : Position ) start = P; for choice in R.choices -- Return a list of choice acc=0; for symbol in choice then -- Return each element of a rule, terminal and nonterminal if symbol.is_terminal then if INPUT(start+acc) == symbol.terminal then acc = acc + 1; --Found correct terminal skip pass it else break; end else res = RULE(symbol.nonterminal , start+acc ); -- try to recognize a nonterminal in position start+acc SET_MEMO(symbol.nonterminal , start+acc, res ); -- we memoize also the failure with special value fail if res == fail then break; end acc = acc + res; end if symbol == choice.last -- check if we have matched the last symbol in a choice if so return return acc; end end return fail; --if no choice match return fail </syntaxhighlight> == Example == Given the following context, a free grammar that recognizes simple arithmetic expressions composed of single digits interleaved by sum, multiplication, and parenthesis. <math>\begin{cases} S \rightarrow A \\ A \rightarrow M\ \texttt{'+'}\ A \ / \ M \\ M \rightarrow P\ \texttt{'*'}\ M \ / \ P \\ P \rightarrow \texttt{'('}\ A\ \texttt{')'}\ / \ D \\ D \rightarrow (\texttt{'0'}-\texttt{'9'}) \end{cases}</math> Denoted with ⊣ the line terminator we can apply the ''packrat algorithm'' {| class="wikitable" |+Derivation of {{kbd|2*(3+4)⊣}} !Syntax tree !Action !Packrat Table |- |[[File:Derivation of a context free grammar with packrat.svg|center|333x333px]] | {| class="wikitable" !Derivation Rules !Input shifted |- |<math> \begin{array}{l} S \rightarrow A \\ A \rightarrow M\ \texttt{'+'}\ A \\ M \rightarrow P\ \texttt{'*'}\ M \\ P \rightarrow \texttt{'('}\ A\ \texttt{')'} \end{array} </math> |Ι |- !Notes !Input left |- |Input doesn't match the first element in the derivation. Backtrack to the first grammar rule with unexplored alternative <math display="inline"> P \rightarrow \texttt{'('}\ A\ \texttt{')'}\ / \ \underline{D} </math> | {{nowrap|{{kbd|2*(3+4)⊣}}}} |} | {| class="wikitable" |+ ! ! colspan="7" |Index |- ! !1 !2 !3 !4 !5 !6 !7 |- !'''S''' ! ! ! ! ! ! ! |- |'''A''' | | | | | | | |- |'''M''' | | | | | | | |- |'''P''' | | | | | | | |- |'''D''' | | | | | | | |- | |2 |* |( |3 | + | 4 |) |} No update because no terminal was recognized |- |[[File:Second_step_in_Parsing_a_CFG_with_packrat.svg|frameless|325x325px]] | {| class="wikitable" !Derivation Rules !Input shifted |- |<math> P \rightarrow D </math><br><math> D \rightarrow 2 </math> | {{kbd|2}} |- !Notes !Input left |- |Shift input by one after deriving terminal {{mono|2}} | {{kbd|*(3+4)⊣}} |} | {| class="wikitable" ! ! colspan="7" |Index |- ! !1 !2 !3 !4 !5 !6 !7 |- !'''S''' ! ! ! ! ! ! ! |- |'''A''' | | | | | | | |- |'''M''' | | | | | | | |- |'''P''' |1 | | | | | | |- |'''D''' |1 | | | | | | |- | |2 |* |( |3 | + | 4 |) |} '''Update:''' D(1) = 1; P(1) = 1; |- |[[File:Third_step_of_recognizing_CFG_with_packrat.svg|frameless|325x325px]] | {| class="wikitable" !Derivation Rules !Input shifted |- |<math> M \rightarrow P\ \texttt{'*'}\ M </math><br><math>P \rightarrow \texttt{'('}\ A\ \texttt{')'}</math> | {{kbd|2*(}} |- !Notes !Input left |- |Shift input by two terminal <math> \{\texttt{*}, \texttt{(}\} </math> | {{kbd|3+4)⊣}} |} | {| class="wikitable" ! ! colspan="7" |Index |- ! !1 !2 !3 !4 !5 !6 !7 |- !'''S''' ! ! ! ! ! ! ! |- |'''A''' | | | | | | | |- |'''M''' | | | | | | | |- |'''P''' |1 | | | | | | |- |'''D''' |1 | | | | | | |- | |2 |* |( |3 | + | 4 |) |} No update because no nonterminal was fully recognized |- |[[File:Fourth_step_in_recognizing_CFG_grammar_with_Packrat.svg|frameless|325x325px]] | {| class="wikitable" !Derivation Rules !Input shifted |- |<math>A \rightarrow M\ \texttt{'+'}\ A</math><br><math>M \rightarrow P\ \texttt{'*'}\ M</math><br><math>P \rightarrow \texttt{'('}\ A\ \texttt{')'}</math> | {{kbd|2*(}} |- !Notes !Input left |- |Input doesn't match the first element in the derivation. Backtrack to the first grammar rule with unexplored alternative <math display="inline"> P \rightarrow \texttt{'('}\ A\ \texttt{')'}\ / \ \underline{D} </math> | {{kbd|3+4)⊣}} |} | {| class="wikitable" ! ! colspan="7" |Index |- ! !1 !2 !3 !4 !5 !6 !7 |- !'''S''' ! ! ! ! ! ! ! |- |'''A''' | | | | | | | |- |'''M''' | | | | | | | |- |'''P''' |1 | | | | | | |- |'''D''' |1 | | | | | | |- | |2 |* |( |3 | + | 4 |) |} No update because no terminal was recognized |- |[[File:5th step of recognizing CFG with packrat.svg|frameless|325x325px]] | {| class="wikitable" !Derivation Rules !Input shifted |- |<math>P \rightarrow D </math><br><math>D \rightarrow 3</math> | {{kbd|2*(}} |- !Notes !Input left |- |Shift input by one after deriving terminal {{mono|3}} but the new input will not match inside <math>M \rightarrow P\ \texttt{'*'}\ M</math> so an unroll is necessary to <math> M \rightarrow P\ \texttt{'*'}\ M \ / \ \underline P </math> | {{kbd|3+4)⊣}} |} | {| class="wikitable" ! ! colspan="7" |Index |- ! !1 !2 !3 !4 !5 !6 !7 |- !'''S''' ! ! ! ! ! ! ! |- |'''A''' | | | | | | | |- |'''M''' | | | | | | | |- |'''P''' |1 | | |1 | | | |- |'''D''' |1 | | |1 | | | |- | |2 |* |( |3 | + | 4 |) |} '''Update:''' D(4) = 1; P(4) = 1; |- |[[File:Sixth step of recognizing CFG with packrat.svg|frameless|325x325px]] | {| class="wikitable" !Derivation Rules !Input shifted |- |<math>M \rightarrow P </math> | {{nowrap|{{kbd|2*(3+}}}} |- !Notes !Input left |- |Roll Back to <math> M \rightarrow P\ \texttt{'*'}\ M \ / \ \underline P </math> And we don't expand it has we have an hit in the memoization table P(4) β 0 so shift the input by P(4). Shift also the <math>+</math> from <math>A \rightarrow M\ \texttt{'+'}\ A</math> | {{kbd|4)⊣}} |} | {| class="wikitable" ! ! colspan="7" |Index |- ! !1 !2 !3 !4 !5 !6 !7 |- !'''S''' ! ! ! ! ! ! ! |- |'''A''' | | | | | | | |- |'''M''' | | | |1 | | | |- |'''P''' |1 | | |1 | | | |- |'''D''' |1 | | |1 | | | |- | |2 |* |( |3 | + | 4 |) |} Hit on P(4) Update M(4) = 1 as M was recognized |- |[[File:Seventh step of recognizing CFG with packrat.svg|frameless|325x325px]] | {| class="wikitable" !Derivation Rules !Input shifted |- |<math>A \rightarrow M\ \texttt{'+'}\ A</math><br><math>M \rightarrow P\ \texttt{'*'}\ M</math><br><math>P \rightarrow \texttt{'('}\ A\ \texttt{')'}</math> | {{nowrap|{{kbd|2*(3+}}}} |- !Notes !Input left |- |Input doesn't match the first element in the derivation. Backtrack to the first grammar rule with unexplored alternative <math display="inline"> P \rightarrow \texttt{'('}\ A\ \texttt{')'}\ / \ \underline{D} </math> | {{kbd|4)⊣}} |} | {| class="wikitable" ! ! colspan="7" |Index |- ! !1 !2 !3 !4 !5 !6 !7 |- !'''S''' ! ! ! ! ! ! ! |- |'''A''' | | | | | | | |- |'''M''' | | | |1 | | | |- |'''P''' |1 | | |1 | | | |- |'''D''' |1 | | |1 | | | |- | |2 |* |( |3 | + | 4 |) |} No update because no terminal was recognized |- |[[File:Eighth step of recognizing CFG with packrat.svg|frameless|325x325px]] | {| class="wikitable" !Derivation Rules !Input shifted |- |<math>P \rightarrow D </math><br><math>D \rightarrow 4</math> | {{kbd|2*(3+}} |- !Notes !Input left |- |Shift input by one after deriving terminal {{mono|4}} but the new input will not match inside <math>M \rightarrow P\ \texttt{'*'}\ M</math> so an unroll is necessary | {{kbd|4)⊣}} |} | {| class="wikitable" ! ! colspan="7" |Index |- ! !1 !2 !3 !4 !5 !6 !7 |- !'''S''' ! ! ! ! ! ! ! |- |'''A''' | | | | | | | |- |'''M''' | | | |1 | | | |- |'''P''' |1 | | |1 | |1 | |- |'''D''' |1 | | |1 | |1 | |- | |2 |* |( |3 | + | 4 |) |} '''Update:''' D(6) = 1; P(6) = 1; |- |[[File:Ninth step of recognizing CFG with packrat.svg|frameless|325x325px]] | {| class="wikitable" !Derivation Rules !Input shifted |- |<math>M \rightarrow P </math> | {{nowrap|{{kbd|2*(3+}}}} |- !Notes !Input left |- |Roll Back to <math> M \rightarrow P\ \texttt{'*'}\ M \ / \ \underline P </math> And we don't expand it has we have an hit in the memoization table P(6) β 0 so shift the input by P(6). but the new input will not match <math>+</math> inside <math>A \rightarrow M\ \texttt{'+'}\ A </math> so an unroll is necessary | {{kbd|4)⊣}} |} | {| class="wikitable" ! ! colspan="7" |Index |- ! !1 !2 !3 !4 !5 !6 !7 |- !'''S''' ! ! ! ! ! ! ! |- |'''A''' | | | | | | | |- |'''M''' | | | |1 | |1 | |- |'''P''' |1 | | |1 | |1 | |- |'''D''' |1 | | |1 | |1 | |- | |2 |* |( |3 | + | 4 |) |} Hit on P(6) Update M(6) = 1 as M was recognized |- |[[File:Tenth step of recognizing CFG with packrat.svg|frameless|325x325px]] | {| class="wikitable" !Derivation Rules !Input shifted |- |<math>A \rightarrow M </math> | {{nowrap|{{kbd|2*(3+4)}}}} |- !Notes !Input left |- |Roll Back to <math>A \rightarrow M\ \texttt{'+'}\ A \ / \ \underline{M} </math> And we don't expand it has we have an hit in the memoization table M(6) β 0 so shift the input by M(6). Also shift <math>)</math> from <math>P \rightarrow \texttt{'('}\ A\ \texttt{')'} </math> | {{kbd|⊣}} |} | {| class="wikitable" ! ! colspan="7" |Index |- ! !1 !2 !3 !4 !5 !6 !7 |- !'''S''' ! ! ! ! ! ! ! |- |'''A''' | | | |3 | | | |- |'''M''' | | | |1 | |1 | |- |'''P''' |1 | |5 |1 | |1 | |- |'''D''' |1 | | |1 | |1 | |- | |2 |* |( |3 | + | 4 |) |} Hit on M(6) Update A(4) = 3 as A was recognized Update P(3)=5 as P was recognized |- |[[File:Eleventh step of recognizing CFG with packrat.svg|frameless|325x325px]] | {| class="wikitable" !Derivation Rules !Input shifted |- | | {{kbd|2*}} |- !Notes !Input left |- |Roll Back to <math> M \rightarrow P\ \texttt{'*'}\ M \ / \ \underline P </math> as terminal <math>* \neq \dashv</math> | {{nowrap|{{kbd|(3+4)⊣}}}} |} | {| class="wikitable" ! ! colspan="7" |Index |- ! !1 !2 !3 !4 !5 !6 !7 |- !'''S''' ! ! ! ! ! ! ! |- |'''A''' | | | |3 | | | |- |'''M''' | | | |1 | |1 | |- |'''P''' |1 | |5 |1 | |1 | |- |'''D''' |1 | | |1 | |1 | |- | |2 |* |( |3 | + | 4 |) |} No update because no terminal was recognized |- |[[File:Twelfth step of recognizing CFG with packrat.svg|frameless|325x325px]] | {| class="wikitable" !Derivation Rules !Input shifted |- |<math>M \rightarrow P </math> | {{nowrap|{{kbd|2*(3+4)}}}} |- !Notes !Input left |- |we don't expand it as we have a hit in the memoization table P(3) β 0, so shift the input by P(3). | {{kbd|⊣}} |} | {| class="wikitable" ! ! colspan="7" |Index |- ! !1 !2 !3 !4 !5 !6 !7 |- !'''S''' ! ! ! ! ! ! ! |- |'''A''' | | | |3 | | | |- |'''M''' |7 | | |1 | |1 | |- |'''P''' |1 | |5 |1 | |1 | |- |'''D''' |1 | | |1 | |1 | |- | |2 |* |( |3 | + | 4 |) |} Hit on P(3) Update M(1)=7 as M was recognized |- |[[File:13th step of recognizing CFG with packrat.svg|frameless|325x325px]] | {| class="wikitable" !Derivation Rules !Input shifted |- | | |- !Notes !Input left |- |Roll Back to <math>A \rightarrow M\ \texttt{'+'}\ A \ / \ \underline{M}</math> as terminal <math>+ \neq \dashv</math> | {{nowrap|{{kbd|2*(3+4)⊣}}}} |} | {| class="wikitable" ! ! colspan="7" |Index |- ! !1 !2 !3 !4 !5 !6 !7 |- !'''S''' ! ! ! ! ! ! ! |- |'''A''' | | | |3 | | | |- |'''M''' |7 | | |1 | |1 | |- |'''P''' |1 | |5 |1 | |1 | |- |'''D''' |1 | | |1 | |1 | |- | |2 |* |( |3 | + | 4 |) |} No update because no terminal was recognized |- |[[File:14th step of recognizing CFG with packrat.svg|frameless|325x325px]] | {| class="wikitable" !Derivation Rules !Input shifted |- |<math>A \rightarrow M</math> | {{nowrap|{{kbd|2*(3+4)⊣}}}} |- !Notes !Input left |- | We don't expand it as we have a hit in the memoization table M(1) β 0, so shift the input by M(1). S was totally reduced, so the input string is recognized. | |} | {| class="wikitable" ! ! colspan="7" |Index |- ! !1 !2 !3 !4 !5 !6 !7 |- !'''S''' !7 ! ! ! ! ! ! |- |'''A''' |7 | | |3 | | | |- |'''M''' |7 | | |1 | |1 | |- |'''P''' |1 | |5 |1 | |1 | |- |'''D''' |1 | | |1 | |1 | |- | |2 |* |( |3 | + | 4 |) |} Hit on M(1) Update A(1)=7 as A was recognized Update S(1)=7 as S was recognized |} == Implementation == {{See also|Comparison of parser generators#Parsing expression grammars, deterministic boolean grammars}} {| class="wikitable sortable" style="text-align: center; font-size: 85%; width: auto;" |- ! Name !! [[Parsing]] algorithm !! [[Programming language|Output languages]] !! Grammar, code !! Development platform !! [[Software license|License]] |- | AustenX || Packrat (modified) || [[Java (programming language)|Java]] || {{D-P|Separate}} || {{Yes|All}} || {{Free}}, [[BSD licenses|BSD]] |- | Aurochs || Packrat || [[C (programming language)|C]], [[OCaml]], [[Java (programming language)|Java]] || {{D-A|Mixed}} || {{Yes|All}} || {{Free}}, [[GNU General Public License|GNU GPL]] |- | Canopy || Packrat || [[Java (programming language)|Java]], [[JavaScript]], [[Python (programming language)|Python]], [[Ruby (programming language)|Ruby]] || {{D-P|Separate}} || {{Yes|All}} || {{Free}}, [[GNU General Public License|GNU GPL]] |- | CL-peg || Packrat || [[Common Lisp]] || {{D-A|Mixed}} || {{Yes|All}} || {{Free}}, [[MIT License|MIT]] |- | Drat! || Packrat || [[D (programming language)|D]] || {{D-A|Mixed}} || {{Yes|All}} || {{Free}}, [[GNU General Public License|GNU GPL]] |- | Frisby || Packrat || [[Haskell (programming language)|Haskell]] || {{D-A|Mixed}} || {{Yes|All}} || {{Free}}, [[BSD licenses|BSD]] |- | [[tcllib|grammar::peg]] || Packrat || [[Tcl]] || {{D-A|Mixed}} || {{Yes|All}} || {{Free}}, [[BSD licenses|BSD]] |- | IronMeta || Packrat || [[C Sharp (programming language)|C#]] || {{D-A|Mixed}} || {{Some|[[Microsoft Windows|Windows]]}} || {{Free}}, [[BSD licenses|BSD]] |- | [https://github.com/TheLartians/PEGParser PEGParser] || Packrat (supporting left-recursion and grammar ambiguity) || [[C++]] || Identical || {{Yes|All}} || {{Free}}, [[BSD licenses|BSD]] |- | Narwhal || Packrat || [[C (programming language)|C]] || {{D-A|Mixed}} || {{Some|[[POSIX]], [[Microsoft Windows|Windows]]}} || {{Free}}, [[BSD licenses|BSD]] |- | neotoma || Packrat || [[Erlang (programming language)|Erlang]] || {{D-P|Separate}} || {{Yes|All}} || {{Free}}, [[MIT License|MIT]] |- | [[OMeta]] || Packrat (modified, partial memoization) || [[JavaScript]], [[Squeak]], [[Python (programming language)|Python]] || {{D-A|Mixed}} || {{Yes|All}} || {{Free}}, [[MIT License|MIT]] |- | PackCC || Packrat (modified, left-recursion support) || [[C (programming language)|C]] || {{D-A|Mixed}} || {{Yes|All}} || {{Free}}, [[MIT License|MIT]] |- | Packrat || Packrat || [[Scheme (programming language)|Scheme]] || {{D-A|Mixed}} || {{Yes|All}} || {{Free}}, [[MIT License|MIT]] |- | [[Pappy]] || Packrat || [[Haskell (programming language)|Haskell]] || {{D-A|Mixed}} || {{Yes|All}} || {{Free}}, [[BSD licenses|BSD]] |- | Parsnip || Packrat || [[C++]] || {{D-A|Mixed}} || {{Some|[[Microsoft Windows|Windows]]}} || {{Free}}, [[GNU General Public License|GNU GPL]] |- | PEG.js || Packrat (partial memoization) || [[JavaScript]] || {{D-A|Mixed}} || {{Yes|All}} || {{Free}}, [[MIT License|MIT]] |- | Peggy<ref>Maintained fork of PEG.js</ref> || Packrat (partial memoization) || [[JavaScript]] || {{D-A|Mixed}} || {{Yes|All}} || {{Free}}, [[MIT License|MIT]] |- | Pegasus || Recursive descent, Packrat (selectively) || [[C Sharp (programming language)|C#]] || {{D-A|Mixed}} || {{Some|[[Microsoft Windows|Windows]]}} || {{Free}}, [[MIT License|MIT]] |- | PetitParser || Packrat || [[Smalltalk]], [[Java (programming language)|Java]], [[Dart (programming language)|Dart]] || {{D-A|Mixed}} || {{Yes|All}} || {{Free}}, [[MIT License|MIT]] |- | [[PyPy]] rlib || Packrat || [[Python (programming language)|Python]] || {{D-A|Mixed}} || {{Yes|All}} || {{Free}}, [[MIT License|MIT]] |- | Rats! || Packrat || [[Java (programming language)|Java]] || {{D-A|Mixed}} || {{Some|[[Java virtual machine]]}} || {{Free}}, [[GNU Lesser General Public License|GNU LGPL]] |- | go-packrat || Packrat || [[Go (programming language)|Go]] || Identical || All || {{Free}}, [[GPLv3]] |} == See also == * [[CYK algorithm]] * [[Context-free grammar]] * [[List of algorithms#Parsing|Parsing algorithms]] * [[Earley parser]] == References == {{Reflist}} == External links == * [https://bford.info/pub/lang/packrat-icfp02.pdf Packrat Parsing: Simple, Powerful, Lazy, Linear Time] * [https://bford.info/pub/lang/peg.pdf Parsing Expression Grammars: A Recognition-Based Syntactic Foundation] {{parsers}} [[Category:Parsing algorithms]] [[Category:Dynamic programming]]
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)
Pages transcluded onto the current version of this page
(
help
)
:
Template:Cite arXiv
(
edit
)
Template:Cite book
(
edit
)
Template:Cite journal
(
edit
)
Template:Cite web
(
edit
)
Template:D-A
(
edit
)
Template:D-P
(
edit
)
Template:Free
(
edit
)
Template:Infobox algorithm
(
edit
)
Template:Kbd
(
edit
)
Template:Mono
(
edit
)
Template:Nowrap
(
edit
)
Template:Parsers
(
edit
)
Template:Reflist
(
edit
)
Template:See also
(
edit
)
Template:Short description
(
edit
)
Template:Some
(
edit
)
Template:Yes
(
edit
)