Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
CYK algorithm
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Algorithm== ===As pseudocode=== The algorithm in [[pseudocode]] is as follows: '''let''' the input be a string ''I'' consisting of ''n'' characters: ''a''<sub>1</sub> ... ''a''<sub>''n''</sub>. '''let''' the grammar contain ''r'' nonterminal symbols ''R''<sub>1</sub> ... ''R''<sub>''r''</sub>, with start symbol ''R''<sub>1</sub>. '''let''' ''P''[''n'',''n'',''r''] be an array of booleans. Initialize all elements of ''P'' to false. '''let''' ''back''[''n'',''n'',''r''] be an array of lists of backpointing triples. Initialize all elements of ''back'' to the empty list. '''for each''' ''s'' = 1 to ''n'' '''for each''' unit production ''R''<sub>''v''</sub> → ''a''<sub>''s''</sub> '''set''' ''P''[''1'',''s'',''v''] = true '''for each''' ''l'' = 2 to ''n'' ''-- Length of span'' '''for each''' ''s'' = 1 to ''n''-''l''+1 ''-- Start of span'' '''for each''' ''p'' = 1 to ''l''-1 ''-- Partition of span'' '''for each''' production ''R''<sub>''a''</sub> → ''R''<sub>''b''</sub> ''R''<sub>''c''</sub> '''if''' ''P''[''p'',''s'',''b''] and ''P''[''l''-''p'',''s''+''p'',''c''] '''then''' '''set''' ''P''[''l'',''s'',''a''] = true, append <p,b,c> to ''back''[''l'',''s'',''a''] '''if''' ''P''[n,''1'',''1''] is true '''then''' ''I'' is member of language '''return''' ''back'' -- by ''retracing the steps through back, one can easily construct all possible parse trees of the string.'' '''else''' '''return''' "not a member of language" <div class="toccolours mw-collapsible mw-collapsed"> ==== Probabilistic CYK (for finding the most probable parse) ==== Allows to recover the most probable parse given the probabilities of all productions. <div class="mw-collapsible-content"> '''let''' the input be a string ''I'' consisting of ''n'' characters: ''a''<sub>1</sub> ... ''a''<sub>''n''</sub>. '''let''' the grammar contain ''r'' nonterminal symbols ''R''<sub>1</sub> ... ''R''<sub>''r''</sub>, with start symbol ''R''<sub>1</sub>. '''let''' ''P''[''n'',''n'',''r''] be an array of real numbers. Initialize all elements of ''P'' to zero. '''let''' ''back''[''n'',''n'',''r''] be an array of backpointing triples. '''for each''' ''s'' = 1 to ''n'' '''for each''' unit production ''R''<sub>''v''</sub> →''a''<sub>''s''</sub> '''set''' ''P''[''1'',''s'',''v''] = Pr(''R''<sub>''v''</sub> →''a''<sub>''s''</sub>) '''for each''' ''l'' = 2 to ''n'' ''-- Length of span'' '''for each''' ''s'' = 1 to ''n''-''l''+1 ''-- Start of span'' '''for each''' ''p'' = 1 to ''l''-1 ''-- Partition of span'' '''for each''' production ''R''<sub>''a''</sub> → ''R''<sub>''b''</sub> ''R''<sub>''c''</sub> prob_splitting = Pr(''R''<sub>''a''</sub> →''R''<sub>''b''</sub> ''R''<sub>''c''</sub>) * ''P''[''p'',''s'',''b''] * ''P''[''l''-''p'',''s''+''p'',''c''] '''if''' prob_splitting > ''P''[''l'',''s'',''a''] '''then''' '''set''' ''P''[''l'',''s'',''a''] = prob_splitting '''set''' ''back''[''l'',''s'',''a''] = <p,b,c> '''if''' ''P''[n,''1'',''1''] > 0 '''then''' find the parse tree by retracing through ''back'' '''return''' the parse tree '''else''' '''return''' "not a member of language" </div> </div> ===As prose=== In informal terms, this algorithm considers every possible substring of the input string and sets <math>P[l,s,v]</math> to be true if the substring of length <math>l</math> starting from <math>s</math> can be generated from the nonterminal <math>R_v</math>. Once it has considered substrings of length 1, it goes on to substrings of length 2, and so on. For substrings of length 2 and greater, it considers every possible partition of the substring into two parts, and checks to see if there is some production <math>A \to B \; C</math> such that <math>B</math> matches the first part and <math>C</math> matches the second part. If so, it records <math>A</math> as matching the whole substring. Once this process is completed, the input string is generated by the grammar if the substring containing the entire input string is matched by the start symbol.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)