Editing Chomsky normal form (section)

==Converting a grammar to Chomsky normal form==

To convert a grammar to Chomsky normal form, a sequence of simple transformations is applied in a certain order; this is described in most textbooks on [[automata theory]].<ref name="Hopcroft.Ullman.1979"/>{{rp|87–94}}<ref>{{cite book |last1=Hopcroft |first1=John E. |last2=Motwani |first2=Rajeev |last3=Ullman |first3=Jeffrey D. |date=2006 |title=Introduction to Automata Theory, Languages, and Computation |edition=3rd |publisher=Addison-Wesley |isbn=978-0-321-45536-9 |url-access=registration |url=https://archive.org/details/introductiontoau0000hopc }} Section 7.1.5, p.272</ref><ref>{{cite book |last=Rich |first=Elaine |author-link=Elaine Rich|date=2007 |title=Automata, Computability, and Complexity: Theory and Applications |publisher=Prentice-Hall |edition=1st |page=169 |section=11.8 Normal Forms |url=https://www.cs.utexas.edu/~ear/cs341/automatabook/AutomataTheoryBook.pdf |archive-url=https://archive.today/20230117061906/https://www.cs.utexas.edu/~ear/cs341/automatabook/AutomataTheoryBook.pdf |archive-date=2023-01-17 |isbn=978-0132288064}}</ref><ref>{{cite book |last=Wegener |first=Ingo |date=1993 |title=Theoretische Informatik - Eine algorithmenorientierte Einführung |language=de |series=Leitfäden und Mongraphien der Informatik |publisher=B. G. Teubner |location=Stuttgart |isbn=978-3-519-02123-0}} Section 6.2 "Die Chomsky-Normalform für kontextfreie Grammatiken", p. 149–152</ref>
The presentation here follows Hopcroft, Ullman (1979), but is adapted to use the transformation names from Lange, Leiß (2009).<ref name="Lange.Leis.2009">{{cite journal |last1=Lange |first1=Martin |last2=Leiß |first2=Hans |date=2009 |title=To CNF or not to CNF? An Efficient Yet Presentable Version of the CYK Algorithm |journal=Informatica Didactica |volume=8 |url=http://ddi.cs.uni-potsdam.de/InformaticaDidactica/LangeLeiss2009.pdf |archive-url=https://web.archive.org/web/20110719111029/http://ddi.cs.uni-potsdam.de/InformaticaDidactica/LangeLeiss2009.pdf |archive-date=2011-07-19 |url-status=live }}</ref><ref group=note>For example, Hopcroft, Ullman (1979) merged '''TERM''' and '''BIN''' into a single transformation.</ref> Each of the following transformations establishes one of the properties required for Chomsky normal form.

===START: Eliminate the start symbol from right-hand sides===

Introduce a new start symbol ''S''<sub>0</sub>, and a new rule 
:''S''<sub>0</sub> → ''S'', 
where ''S'' is the previous start symbol.
This does not change the grammar's produced language, and ''S''<sub>0</sub> will not occur on any rule's right-hand side.

===TERM: Eliminate rules with nonsolitary terminals===

To eliminate each rule 
:''A'' → ''X''<sub>1</sub> ... ''a'' ... ''X''<sub>''n''</sub>
with a terminal symbol ''a'' being not the only symbol on the right-hand side, introduce, for every such terminal, a new nonterminal symbol ''N''<sub>''a''</sub>, and a new rule 
:''N''<sub>''a''</sub> → ''a''. 
Change every rule 
:''A'' → ''X''<sub>1</sub> ... ''a'' ... ''X''<sub>''n''</sub> 
to 
:''A'' → ''X''<sub>1</sub> ... ''N''<sub>''a''</sub> ... ''X''<sub>''n''</sub>.
If several terminal symbols occur on the right-hand side, simultaneously replace each of them by its associated nonterminal symbol.
This does not change the grammar's produced language.<ref name="Hopcroft.Ullman.1979"/>{{rp|92}}

===BIN: Eliminate right-hand sides with more than 2 nonterminals===

Replace each rule 
:''A'' → ''X''<sub>1</sub> ''X''<sub>2</sub> ... ''X''<sub>''n''</sub> 
with more than 2 nonterminals ''X''<sub>1</sub>,...,''X''<sub>''n''</sub> by rules 
:''A'' → ''X''<sub>1</sub> ''A''<sub>1</sub>, 
:''A''<sub>1</sub> → ''X''<sub>2</sub> ''A''<sub>2</sub>, 
:... , 
:''A''<sub>''n''-2</sub> → ''X''<sub>''n''-1</sub> ''X''<sub>''n''</sub>, 
where ''A''<sub>''i''</sub> are new nonterminal symbols.
Again, this does not change the grammar's produced language.<ref name="Hopcroft.Ullman.1979"/>{{rp|93}}

===DEL: Eliminate ε-rules===

An ε-rule is a rule of the form 
:''A'' → ε, 
where ''A'' is not ''S''<sub>0</sub>, the grammar's start symbol.

To eliminate all rules of this form, first determine the set of all nonterminals that derive ε.
Hopcroft and Ullman (1979) call such nonterminals ''nullable'', and compute them as follows:
* If a rule ''A'' → ε exists, then ''A'' is nullable.
* If a rule ''A'' → ''X''<sub>1</sub> ... ''X''<sub>''n''</sub> exists, and every single ''X''<sub>''i''</sub> is nullable, then ''A'' is nullable, too.

Obtain an intermediate grammar by replacing each rule 
:''A'' → ''X''<sub>1</sub> ... ''X''<sub>''n''</sub> 
by all versions with some nullable ''X''<sub>''i''</sub> omitted.
By deleting in this grammar each ε-rule, unless its left-hand side is the start symbol, the transformed grammar is obtained.<ref name="Hopcroft.Ullman.1979"/>{{rp|90}}

For example, in the following grammar, with start symbol ''S''<sub>0</sub>,
: ''S''<sub>0</sub> → ''AbB'' | ''C''
: ''B'' → ''AA'' | ''AC''
: ''C'' → ''b'' | ''c''
: ''A'' → ''a'' | ε
the nonterminal ''A'', and hence also ''B'', is nullable, while neither ''C'' nor ''S''<sub>0</sub> is.
Hence the following intermediate grammar is obtained:<ref group=note>indicating a kept and omitted nonterminal ''N'' by ''{{color|#006000|N}}'' and ''{{color|#ffc0c0|<s>N</s>}}'', respectively</ref>
: ''S''<sub>0</sub> → ''{{color|#006000|A}}b{{color|#006000|B}}'' | ''{{color|#006000|A}}b{{color|#ffc0c0|<s>B</s>}}'' | ''{{color|#ffc0c0|<s>A</s>}}b{{color|#006000|B}}'' | ''{{color|#ffc0c0|<s>A</s>}}b{{color|#ffc0c0|<s>B</s>}}'' &nbsp; | &nbsp; ''C''
: ''B'' → ''{{color|#006000|AA}}'' | ''{{color|#ffc0c0|<s>A</s>}}{{color|#006000|A}}'' | ''{{color|#006000|A}}{{color|#ffc0c0|<s>A</s>}}'' | ''{{color|#ffc0c0|<s>A</s>}}ε{{color|#ffc0c0|<s>A</s>}}'' &nbsp; | &nbsp; ''{{color|#006000|A}}C'' | ''{{color|#ffc0c0|<s>A</s>}}C''
: ''C'' → ''b'' | ''c''
: ''A'' → ''a'' | ε
In this grammar, all ε-rules have been "[[inlining|inlined]] at the call site".<ref group=note>If the grammar had a rule ''S''<sub>0</sub> → ε, it could not be "inlined", since it had no "call sites". Therefore it could not be deleted in the next step.</ref>
In the next step, they can hence be deleted, yielding the grammar:
: ''S''<sub>0</sub> → ''AbB'' | ''Ab'' | ''bB'' | ''b'' &nbsp; | &nbsp; ''C''
: ''B'' → ''AA'' | ''A'' &nbsp; | &nbsp; ''AC'' | ''C''
: ''C'' → ''b'' | ''c''
: ''A'' → ''a''
This grammar produces the same language as the original example grammar, viz. {''ab'',''aba'',''abaa'',''abab'',''abac'',''abb'',''abc'',''b'',''ba'',''baa'',''bab'',''bac'',''bb'',''bc'',''c''}, but has no ε-rules.

===UNIT: Eliminate unit rules===

A unit rule is a rule of the form 
:''A'' → ''B'', 
where ''A'', ''B'' are nonterminal symbols.
To remove it, for each rule 
:''B'' →  ''X''<sub>1</sub> ... ''X''<sub>''n''</sub>, 
where  ''X''<sub>1</sub> ... ''X''<sub>''n''</sub> is a string of nonterminals and terminals, add rule 
:''A'' →  ''X''<sub>1</sub> ... ''X''<sub>''n''</sub> 
unless this is a unit rule which has already been (or is being) removed. The skipping of nonterminal symbol ''B'' in the resulting grammar is possible due to ''B'' being a member of the unit closure of nonterminal symbol ''A''.<ref>{{Cite book |last=Allison |first=Charles D. |title=Foundations of Computing: An Accessible Introduction to Automata and Formal Languages |publisher=Fresh Sources, Inc. |year=2022 |isbn=9780578944173 |pages=176 |language=en}}</ref>

===Order of transformations===

{| class="wikitable collapsible" style="float:right"
|-
|+ Mutual preservation<BR>of transformation results
|-
| colspan=6 | Transformation ''X'' {{color|#004000|always preserves}} ({{Aye}})<BR>resp. {{color|#400000|may destroy}} ({{Nay}}) the result of ''Y'':
|-
! {{diagonal split header|''X''|''Y''}}
! START  ||TERM||BIN||DEL||UNIT
|-
! START 
|        || {{Ya}}   || {{Ya}}  || {{Na}}  || {{Na}} 
|-
! TERM 
| {{Ya}} ||          || {{Na}}  || {{Ya}}  || {{Ya}} 
|-
! BIN 
| {{Ya}} || {{Ya}}   ||         || {{Ya}}  || {{Ya}} 
|-
! DEL 
| {{Ya}} || {{Ya}}   || {{Ya}}  ||         || {{Na}} 
|-
! UNIT 
| {{Ya}} || {{Ya}}   || {{Ya}}  ||{{Ya|text=({{Aye}})<sup>*</sup>}}||  
|-
| colspan=6 | <sup>*</sup>'''UNIT''' preserves the result of '''DEL'''<BR>&nbsp; if '''START''' had been called before.
|}

When choosing the order in which the above transformations are to be applied, it has to be considered that some transformations may destroy the result achieved by other ones. For example, '''START''' will re-introduce a unit rule if it is applied after '''UNIT'''. The table shows which orderings are admitted.

Moreover, the worst-case bloat in grammar size<ref group=note>i.e. written length, measured in symbols</ref> depends on the transformation order. Using |''G''| to denote the size of the original grammar ''G'', the size blow-up in the worst case may range from |''G''|<sup>2</sup> to 2<sup>2 |G|</sup>, depending on the transformation algorithm used.<ref name="Lange.Leis.2009"/>{{rp|7}} The blow-up in grammar size depends on the order between '''DEL''' and '''BIN'''. It may be exponential when '''DEL''' is done first, but is linear otherwise. '''UNIT''' can incur a quadratic blow-up in the size of the grammar.<ref name="Lange.Leis.2009"/>{{rp|5}} The orderings '''START''','''TERM''','''BIN''','''DEL''','''UNIT''' and '''START''','''BIN''','''DEL''','''UNIT''','''TERM''' lead to the least (i.e. quadratic) blow-up.