Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Probabilistic context-free grammar
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Grammar construction== Context-free grammars are represented as a set of rules inspired from attempts to model natural languages.<ref name="Chomsky_1956" /><ref name="Chomsky_1959" /><ref name="Chomsky_1957" /> The rules are absolute and have a typical syntax representation known as [[Backus–Naur form]]. The production rules consist of terminal '''<math>\left \{a, b \right\}</math>''' and non-terminal '''{{mvar|S}}''' symbols and a blank '''<math>\epsilon</math>''' may also be used as an end point. In the production rules of CFG and PCFG the left side has only one nonterminal whereas the right side can be any string of terminal or nonterminals. In PCFG nulls are excluded.<ref name="Durbin 1998" /> An example of a grammar: : <math> S \to aS, S \to bS, S \to \epsilon </math> This grammar can be shortened using the '|' ('or') character into: : <math> S\to aS | bS |\epsilon </math> Terminals in a grammar are words and through the grammar rules a non-terminal symbol is transformed into a string of either terminals and/or non-terminals. The above grammar is read as "beginning from a non-terminal {{mvar|S}} the emission can generate either {{mvar|a}} or {{mvar|b}} or <math>\epsilon</math>". Its derivation is: : <math>S \Rightarrow aS\Rightarrow abS\Rightarrow abbS \Rightarrow abb</math> [[Ambiguous grammar]] may result in ambiguous parsing if applied on [[homograph]]s since the same word sequence can have more than one interpretation. [[pun|Pun sentences]] such as the newspaper headline "Iraqi Head Seeks Arms" are an example of ambiguous parses. One strategy of dealing with ambiguous parses (originating with grammarians as early as [[Pāṇini]]) is to add yet more rules, or prioritize them so that one rule takes precedence over others. This, however, has the drawback of proliferating the rules, often to the point where they become difficult to manage. Another difficulty is overgeneration, where unlicensed structures are also generated. Probabilistic grammars circumvent these problems by ranking various productions on frequency weights, resulting in a "most likely" (winner-take-all) interpretation. As usage patterns are altered in [[Historical linguistics|diachronic]] shifts, these probabilistic rules can be re-learned, thus updating the grammar. Assigning probability to production rules makes a PCFG. These probabilities are informed by observing distributions on a training set of similar composition to the language to be modeled. On most samples of broad language, probabilistic grammars where probabilities are estimated from data typically outperform hand-crafted grammars. CFGs when contrasted with PCFGs are not applicable to RNA structure prediction because while they incorporate sequence-structure relationship they lack the scoring metrics that reveal a sequence structural potential <ref name="Dowell 2004" />
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)