Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Probabilistic context-free grammar
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
====PCFG in homology search==== Covariance models (CMs) are a special type of PCFGs with applications in database searches for homologs, annotation and RNA classification. Through CMs it is possible to build PCFG-based RNA profiles where related RNAs can be represented by a consensus secondary structure.<ref name ="Eddy 1994" /><ref name="Sakakibara 1994" /> The RNA analysis package Infernal uses such profiles in inference of RNA alignments.<ref name="Nawrocki 2013" /> The Rfam database also uses CMs in classifying RNAs into families based on their structure and sequence information.<ref name="Gardner 2010" /> CMs are designed from a consensus RNA structure. A CM allows [[indel]]s of unlimited length in the alignment. Terminals constitute states in the CM and the transition probabilities between the states is 1 if no indels are considered.<ref name="Durbin 1998" /> Grammars in a CM are as follows: :; <math>P \to aWb</math>:probabilities of pairwise interactions between 16 possible pairs :; <math>L \to aW</math>:probabilities of generating 4 possible single bases on the left :; <math>R \to Wa</math>:probabilities of generating 4 possible single bases on the right :; <math>B \to SS</math>:bifurcation with a probability of 1 :; <math>S \to W</math>:start with a probability of 1 :; <math>E \to \epsilon</math>:end with a probability of 1 The model has 6 possible states and each state grammar includes different types of secondary structure probabilities of the non-terminals. The states are connected by transitions. Ideally current node states connect to all insert states and subsequent node states connect to non-insert states. In order to allow insertion of more than one base insert states connect to themselves.<ref name="Durbin 1998" /> In order to score a CM model the inside-outside algorithms are used. CMs use a slightly different implementation of CYK. Log-odds emission scores for the optimum parse tree - <math>\log \hat{e}</math> - are calculated out of the emitting states <math>P,~L,~R</math>. Since these scores are a function of sequence length a more discriminative measure to recover an optimum parse tree probability score- <math>\log\text{P}(x, \hat{\pi}|\theta)</math> - is reached by limiting the maximum length of the sequence to be aligned and calculating the log-odds relative to a null. The computation time of this step is linear to the database size and the algorithm has a memory complexity of <math>O(M_aD+M_bD^2)</math>.<ref name="Durbin 1998" />
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)