Editing Probabilistic context-free grammar (section)

===== Calculate mutation rates for paired and unpaired bases =====
By pairing sequences in all possible ways overall mutation rates are estimated. In order to recover plausible mutations a sequence identity threshold should be used so that the comparison is between similar sequences. This approach uses 85% identity threshold between pairing sequences. 
First single base positions differences -except for gapped columns- between sequence pairs are counted such that if the same position in two sequences had different bases {{mvar|X, Y}} the count of the difference is incremented for each sequence.

 {{nowrap|while <math>X\ne Y</math>}}
                {{nowrap|<math>        C_{\text{XY}} +1</math> first sequence  pair}}
                {{nowrap|<math>        C_{\text{YX}} +1</math> second sequence pair}}

 {{nowrap|Calculate mutation rates.}}
                {{nowrap|Let  <math>r_{\text{XY}}= </math> mutation of base X to base Y <math>= \frac {K~C_{\text{XY}}} {P_{x}P_{s}}</math>}}
                {{nowrap|Let  <math>r_{\text{XX}}= </math> the negative of the rate of X mutation to other bases <math>= - \sum r_{\text{XY}}</math>}}
                {{nowrap|<math>P_{s} =</math> the probability that the base is not paired.}}

For unpaired bases a 4 X 4 mutation rate matrix is used that satisfies that the mutation flow from X to Y is reversible:<ref name="Tavaré 1986" />
:                <math>PX^rXY = PY^rYX</math> 
For basepairs a 16 X 16 rate distribution matrix is similarly generated.<ref name="Muse 1995" /><ref name="Schöniger  1994" />
The PCFG is used to predict the prior probability distribution of the structure whereas posterior probabilities are estimated by the inside-outside algorithm and the most likely structure is found by the CYK algorithm.<ref name="Knudsen 2003" />