Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Earley parser
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Constructing the parse forest == Earley's dissertation<ref name=Earley3>{{cite book | last=Earley | first=Jay | title=An Efficient Context-Free Parsing Algorithm | year=1968 | publisher=Carnegie-Mellon Dissertation | page=106 | url=http://reports-archive.adm.cs.cmu.edu/anon/anon/usr/ftp/scan/CMU-CS-68-earley.pdf | access-date=2012-09-12 | archive-date=2017-09-22 | archive-url=https://web.archive.org/web/20170922004954/http://reports-archive.adm.cs.cmu.edu/anon/anon/usr/ftp/scan/CMU-CS-68-earley.pdf | url-status=dead }}</ref> briefly describes an algorithm for constructing parse trees by adding a set of pointers from each non-terminal in an Earley item back to the items that caused it to be recognized. But [[Masaru Tomita|Tomita]] noticed<ref>{{cite book|last1=Tomita|first1=Masaru|title=Efficient Parsing for Natural Language: A Fast Algorithm for Practical Systems|date=April 17, 2013|publisher=Springer Science and Business Media|isbn=978-1475718850|page=74|url=https://books.google.com/books?id=DAjkBwAAQBAJ&q=Tomita%20Efficient%20Parsing%20for%20natural%20Language&pg=PA74|access-date=16 September 2015}}</ref> that this does not take into account the relations between symbols, so if we consider the grammar S β SS | b and the string bbb, it only notes that each S can match one or two b's, and thus produces spurious derivations for bb and bbbb as well as the two correct derivations for bbb. Another method<ref>{{cite journal|last1=Scott|first1=Elizabeth|title=SPPF-Style Parsing From Earley Recognizers|journal=Electronic Notes in Theoretical Computer Science|date=April 1, 2008|volume=203|issue=2|pages=53β67|doi=10.1016/j.entcs.2008.03.044|doi-access=free}}</ref> is to build the parse forest as you go, augmenting each Earley item with a pointer to a shared packed parse forest (SPPF) node labelled with a triple (s, i, j) where s is a symbol or an LR(0) item (production rule with dot), and i and j give the section of the input string derived by this node. A node's contents are either a pair of child pointers giving a single derivation, or a list of "packed" nodes each containing a pair of pointers and representing one derivation. SPPF nodes are unique (there is only one with a given label), but may contain more than one derivation for [[syntactic ambiguity|ambiguous]] parses. So even if an operation does not add an Earley item (because it already exists), it may still add a derivation to the item's parse forest. * Predicted items have a null SPPF pointer. * The scanner creates an SPPF node representing the non-terminal it is scanning. * Then when the scanner or completer advance an item, they add a derivation whose children are the node from the item whose dot was advanced, and the one for the new symbol that was advanced over (the non-terminal or completed item). SPPF nodes are never labeled with a completed LR(0) item: instead they are labelled with the symbol that is produced so that all derivations are combined under one node regardless of which alternative production they come from.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)