Editing Chargaff's rules (section)

== Research ==
The second parity rule was discovered in 1968.<ref name=Rudner1968 /> It states that, in single-stranded DNA, the number of adenine units is ''approximately'' equal to that of thymine (%A <span>≈</span> %T), and the number of cytosine units is ''approximately'' equal to that of guanine (%C <span>≈</span> %G).

In 2006, it was shown that this rule applies to four<ref name="Chargaff1952" /> of the five types of double stranded genomes; specifically it applies to the [[eukaryote|eukaryotic]] [[chromosomes]], the [[bacteria]]l chromosomes, the double stranded [[DNA]] viral genomes, and the [[archaea]]l chromosomes.<ref>{{cite journal |doi=10.1016/j.bbrc.2005.11.160 |vauthors=Mitchell D, Bridge R |year=2006 |title=A test of Chargaff's second rule |journal=Biochem Biophys Res Commun |volume=340 |issue=1 |pages=90–94 |pmid=16364245}}</ref> It does not apply to [[Organellar DNA|organellar genomes]] ([[mitochondria]] and [[plastid]]s) smaller than ~20–30 [[basepair|kbp]], nor does it apply to single stranded DNA (viral) genomes or any type of [[RNA]] genome. The basis for this rule is still under investigation, although genome size may play a role.
[[File:Chargaff-2nd-histogram.png|thumb|right|Histogram showing how 20309 chromosomes adhere to Chargaff's second parity rule]]
The rule itself has consequences. In most bacterial genomes (which are generally 80–90% [[Coding region|coding]]) genes are arranged in such a fashion that approximately 50% of the coding sequence lies on either strand. [[Wacław Szybalski]], in the 1960s, showed that in [[bacteriophage]] coding sequences [[purines]] (A and G) exceed [[pyrimidines]] (C and T).<ref name=Szybalski1966>{{cite journal |vauthors=Szybalski W, Kubinski H, Sheldrick O |year=1966 |title=Pyrimidine clusters on the transcribing strand of DNA and their possible role in the initiation of RNA synthesis |journal=Cold Spring Harb Symp Quant Biol |volume=31 |pages=123–127 |pmid=4966069 |doi=10.1101/SQB.1966.031.01.019}}</ref> This rule has since been confirmed in other organisms and should probably be now termed "[[Szybalski's rule]]". While Szybalski's rule generally holds, exceptions are known to exist.<ref name=Cristillo1998>{{cite book |author=Cristillo AD |year=1998 |title=Characterization of G0/G1 switch genes in cultured T lymphocytes |publisher=Queen's University |location=Kingston, Ontario, Canada}}</ref><ref name=Bell1999>{{cite journal |doi=10.1006/jtbi.1998.0858 |vauthors=Bell SJ, Forsdyke DR |year=1999 |title=Deviations from Chargaff's second parity rule correlate with direction of transcription |journal=J Theor Biol |volume=197 |issue=1 |pages=63–76 |pmid=10036208|bibcode=1999JThBi.197...63B }}</ref><ref name=Lao2000>{{cite journal |doi=10.1101/gr.10.2.228 |vauthors=Lao PJ, Forsdyke DR |year=2000 |title=Thermophilic Bacteria Strictly Obey Szybalski's Transcription Direction Rule and Politely Purine-Load RNAs with Both Adenine and Guanine |journal= Genome Research|volume=10 |issue=2 |pages=228–236 |pmid=10673280 |pmc=310832}}</ref> The biological basis for Szybalski's rule is not yet known.

The combined effect of Chargaff's second rule and Szybalski's rule can be seen in bacterial genomes where the coding sequences are not equally distributed. The [[genetic code]] has 64 [[codons]] of which 3 function as termination codons: there are only 20 [[amino acid]]s normally present in proteins. (There are two uncommon amino acids—[[selenocysteine]] and [[pyrrolysine]]—found in a limited number of proteins and encoded by the [[stop codon]]s—TGA and TAG respectively.) The mismatch between the number of codons and amino acids allows several codons to code for a single amino acid—such codons normally differ only at the third codon base position.

Multivariate statistical analysis of codon use within genomes with unequal quantities of coding sequences on the two strands has shown that codon use in the third position depends on the strand on which the gene is located. This seems likely to be the result of Szybalski's and Chargaff's rules. Because of the asymmetry in pyrimidine and purine use in coding sequences, the strand with the greater coding content will tend to have the greater number of purine bases (Szybalski's rule). Because the number of purine bases will, to a very good approximation, equal the number of their complementary pyrimidines within the same strand and, because the coding sequences occupy 80–90% of the strand, there appears to be (1) a selective pressure on the third base to minimize the number of purine bases in the strand with the greater coding content; and (2) that this pressure is proportional to the mismatch in the length of the coding sequences between the two strands.
[[File:Chargraff-2nd-6-mers.png|thumb|left|Chargaff's 2nd parity rule for prokaryotic 6-mers]]
The origin of the deviation from Chargaff's rule in the organelles has been suggested to be a consequence of the mechanism of replication.<ref name=Nikolaou2006>{{cite journal |doi=10.1016/j.gene.2006.06.010 |vauthors=Nikolaou C, Almirantis Y |year=2006 |title=Deviations from Chargaff's second parity rule in organellar DNA. Insights into the evolution of organellar genomes |journal=Gene |volume=381 |pages=34–41 |pmid=16893615}}</ref> During replication the DNA strands separate. In single stranded DNA, [[cytosine]] spontaneously slowly deaminates to [[adenosine]] (a C to A [[transversion]]). The longer the strands are separated the greater the quantity of deamination. For reasons that are not yet clear the strands tend to exist longer in single form in mitochondria than in chromosomal DNA. This process tends to yield one strand that is enriched in [[guanine]] (G) and [[thymine]] (T) with its complement enriched in cytosine (C) and adenosine (A), and this process may have given rise to the deviations found in the mitochondria. {{Citation needed|reason=reliable source needed for the whole sentence as there are faults in the judgement|date=January 2013}}{{Dubious|date=January 2013}}

Chargaff's second rule appears to be the consequence of a more complex parity rule: within a single strand of DNA any oligonucleotide ([[k-mer]] or [[n-gram]]; length ≤ 10) is present in equal numbers to its reverse complementary nucleotide. Because of the computational requirements this has not been verified in all genomes for all oligonucleotides. It has been verified for triplet oligonucleotides for a large data set.<ref name="Albrecht-Buehler2006">{{cite journal |doi=10.1073/pnas.0605553103 |author=Albrecht-Buehler G |year=2006 |title=Asymptotically increasing compliance of genomes with Chargaff's second parity rules through inversions and inverted transpositions |journal=Proc Natl Acad Sci USA |volume=103 |issue=47 |pages=17828–17833 |pmid=17093051 |pmc=1635160|bibcode=2006PNAS..10317828A |doi-access=free }}</ref> Albrecht-Buehler has suggested that this rule is the consequence of genomes evolving by a process of [[Chromosomal inversion|inversion]] and [[Transposon|transposition]].<ref name="Albrecht-Buehler2006"/> This process does not appear to have acted on the mitochondrial genomes. Chargaff's second parity rule appears to be extended from the nucleotide-level to populations of codon triplets, in the case of whole single-stranded Human genome DNA.<ref>{{cite journal |author= Perez, J.-C. |title= Codon populations in single-stranded whole human genome DNA are fractal and fine-tuned by the Golden Ratio 1.618 |journal= Interdisciplinary Sciences: Computational Life Sciences |date=September 2010 |volume= 2 |issue= 3 |pages= 228–240 |pmid= 20658335 |doi= 10.1007/s12539-010-0022-0 |s2cid= 54565279 }}</ref>
A kind of "codon-level second Chargaff's parity rule" is proposed as follows:

{{alternating rows table|class=wikitable sortable}}
|+ Intra-strand relation among percentages of codon populations
! scope=col | First codon !! scope=col | Second codon !! scope=col | Relation proposed !! scope=col | Details
|-
| <code>Twx</code> (1st base position is T) || <code>yzA</code> (3rd base position is A) || % <code>Twx</code> <math> \simeq </math> % <code>yzA</code> || <code>Twx</code> and <code>yzA</code> are mirror codons, e.g. <code>TCG</code> and <code>CGA</code>
|-
| <code>Cwx</code> (1st base position is C) || <code>yzG</code> (3rd base position is G) || % <code>Cwx</code> <math> \simeq </math> % <code>yzG</code> || <code>Cwx</code> and <code>yzG</code> are mirror codons, e.g. <code>CTA</code> and <code>TAG</code>
|-
| <code>wTx</code> (2nd base position is T) || <code>yAz</code> (2nd base position is A) || % <code>wTx</code> <math> \simeq </math> % <code>yAz</code> || <code>wTx</code> and <code>yAz</code> are mirror codons, e.g. <code>CTG</code> and <code>CAG</code>
|-
| <code>wCx</code> (2nd base position is C) || <code>yGz</code> (2nd base position is G) || % <code>wCx</code> <math> \simeq </math> % <code>yGz</code> || <code>wCx</code> and <code>yGz</code> are mirror codons, e.g. <code>TCT</code> and <code>AGA</code>
|-
| <code>wxT</code> (3rd base position is T) || <code>Ayz</code> (1st base position is A) || % <code>wxT</code> <math> \simeq </math> % <code>Ayz</code> || <code>wxT</code> and <code>Ayz</code> are mirror codons, e.g. <code>CTT</code> and <code>AAG</code>
|-
| <code>wxC</code> (3rd base position is C) || <code>Gyz</code> (1st base position is G) || % <code>wxC</code> <math> \simeq </math> % <code>Gyz</code> || <code>wxC</code> and <code>Gyz</code> are mirror codons, e.g. <code>GGC</code> and <code>GCC</code>
|-
|}

Examples — computing whole human genome using the first codons reading frame provides:
 36530115 TTT and 36381293 AAA (ratio % = 1.00409). 2087242 TCG and 2085226 CGA (ratio % = 1.00096), etc...

In 2020, it is suggested that the physical properties of the dsDNA (double stranded DNA) and the tendency to maximum entropy of all the physical systems are the cause of Chargaff's second parity rule.<ref>{{cite journal |author= Piero Farisell, Cristian Taccioli, Luca Pagani & Amos Maritan |title= DNA sequence symmetries from randomness: the origin of the Chargaff's second parity rule |journal= Briefings in Bioinformatics |date=April 2020 |volume= 22 |issue= bbaa04 |pages= 2172–2181 |pmid= 32266404 |doi= 10.1093/bib/bbaa041 |pmc= 7986665 |doi-access= free }}</ref> The symmetries and patterns present in the dsDNA sequences can emerge from the physical peculiarities of the dsDNA molecule and the maximum entropy principle alone, rather than from biological or environmental evolutionary pressure.