Editing Probabilistic context-free grammar (section)

===Protein sequence analysis===
Whereas PCFGs have proved powerful tools for predicting RNA secondary structure, usage in the field of protein sequence analysis has been limited. Indeed, the size of the [[amino acid]] alphabet and the variety of interactions seen in proteins make grammar inference much more challenging.<ref name="Searls D 2013">{{cite journal | last1 = Searls | first1 = D | year = 2013 | title = Review: A primer in macromolecular linguistics | journal = Biopolymers | volume = 99 | issue = 3| pages = 203–217 | doi=10.1002/bip.22101| pmid = 23034580 | s2cid = 12676925 }}</ref> As a consequence, most applications of [[formal language theory]] to protein analysis have been mainly restricted to the production of grammars of lower expressive power to model simple functional patterns based on local interactions.<ref>{{cite journal | last1 = Krogh | first1 = A | last2 = Brown | first2 = M | last3 = Mian | first3 = I | last4 = Sjolander | first4 = K | last5 = Haussler | first5 = D | year = 1994 | title = Hidden Markov models in computational biology: Applications to protein modeling | journal = J Mol Biol | volume = 235 | issue = 5| pages = 1501–1531 |doi=10.1006/jmbi.1994.1104 | pmid=8107089| s2cid = 2160404 }}</ref><ref>{{cite journal | last1 = Sigrist | first1 = C | last2 = Cerutti | first2 = L | last3 = Hulo | first3 = N | last4 = Gattiker | first4 = A | last5 = Falquet | first5 = L | last6 = Pagni | first6 = M | last7 = Bairoch | first7 = A | last8 = Bucher | first8 = P | year = 2002 | title = PROSITE: a documented database using patterns and profiles as motif descriptors | journal = Brief Bioinform | volume = 3 | issue = 3| pages = 265–274 | doi=10.1093/bib/3.3.265 | pmid=12230035| doi-access = free }}</ref> Since protein structures commonly display higher-order dependencies including nested and crossing relationships, they clearly exceed the capabilities of any CFG.<ref name="Searls D 2013"/> Still, development of PCFGs allows expressing some of those dependencies and providing the ability to model a wider range of protein patterns.