Editing Chargaff's rules

{{short description|Two rules about the percentage of A, C, G, and T in DNA strands}}
[[File:DNA Diagram.png|thumb|right|upright=1.55|A diagram of DNA base pairing, demonstrating the basis for Chargaff's rules]]
'''Chargaff's rules''' (given by [[Erwin Chargaff]]) state that in the [[DNA]] of any species and any organism, the amount of [[guanine]] should be equal to the amount of [[cytosine]] and the amount of [[adenine]] should be equal to the amount of [[thymine]]. Further, a 1:1 [[Stoichiometry|stoichiometric]] ratio of [[purine]] and [[pyrimidine]] bases (i.e., <code>A+G=T+C</code>) should exist. This pattern is found in both strands of the DNA. They were discovered by Austrian-born chemist [[Erwin Chargaff]]<ref name="Elson1952">{{cite journal |doi=10.1007/BF02170221 |vauthors=Elson D, Chargaff E |year=1953 |title=On the deoxyribonucleic acid content of sea urchin gametes |journal=Experientia |volume=8 |issue=4 |pages=143–145 |pmid=14945441|s2cid=36803326 }}</ref><ref name="Chargaff1952">{{cite journal |vauthors=Chargaff E, Lipshitz R, Green C |s2cid=11358561 |year=1952 |title=Composition of the deoxypentose nucleic acids of four genera of sea-urchin |journal=J Biol Chem |volume=195 |issue=1 |pages=155–160 |doi=10.1016/S0021-9258(19)50884-5 |pmid=14938364|doi-access=free }}</ref> in the late 1940s.

== Definitions ==

=== First parity rule ===
The first rule holds that a double-stranded [[DNA]] molecule, ''globally'' has percentage base pair equality: A% = T% and G% = C%. The rigorous validation of the rule constitutes the basis of [[Watson–Crick base pair]]s in the DNA double helix model.

=== Second parity rule ===
The second rule holds that both Α% ≈ Τ% and G% ≈ C% are valid for each of the two DNA strands.<ref name=Rudner1968>{{cite journal|pmid=4970114|year=1968|last1=Rudner|first1=R|last2=Karkas|first2=JD|last3=Chargaff|first3=E|title=Separation of ''B. Subtilis'' DNA into complementary strands. 3. Direct analysis|volume=60|issue=3|pages=921–2|pmc=225140|journal=Proceedings of the National Academy of Sciences of the United States of America |doi=10.1073/pnas.60.3.921|bibcode=1968PNAS...60..921R |doi-access=free}}</ref> This describes only a global feature of the base composition in a single DNA strand.<ref name="Zhung2003_externallinks">{{cite journal |vauthors=Zhang CT, Zhang R, Ou HY |year=2003 |title=The Z curve database: a oraphic representation of genome sequences |journal=Bioinformatics |volume=19 [issue=5 |issue=5 |pages=590–599 |doi=10.1093/bioinformatics/btg041 |pmid=12651717|doi-access=free }}</ref>

== Research ==
The second parity rule was discovered in 1968.<ref name=Rudner1968 /> It states that, in single-stranded DNA, the number of adenine units is ''approximately'' equal to that of thymine (%A <span>≈</span> %T), and the number of cytosine units is ''approximately'' equal to that of guanine (%C <span>≈</span> %G).

In 2006, it was shown that this rule applies to four<ref name="Chargaff1952" /> of the five types of double stranded genomes; specifically it applies to the [[eukaryote|eukaryotic]] [[chromosomes]], the [[bacteria]]l chromosomes, the double stranded [[DNA]] viral genomes, and the [[archaea]]l chromosomes.<ref>{{cite journal |doi=10.1016/j.bbrc.2005.11.160 |vauthors=Mitchell D, Bridge R |year=2006 |title=A test of Chargaff's second rule |journal=Biochem Biophys Res Commun |volume=340 |issue=1 |pages=90–94 |pmid=16364245}}</ref> It does not apply to [[Organellar DNA|organellar genomes]] ([[mitochondria]] and [[plastid]]s) smaller than ~20–30 [[basepair|kbp]], nor does it apply to single stranded DNA (viral) genomes or any type of [[RNA]] genome. The basis for this rule is still under investigation, although genome size may play a role.
[[File:Chargaff-2nd-histogram.png|thumb|right|Histogram showing how 20309 chromosomes adhere to Chargaff's second parity rule]]
The rule itself has consequences. In most bacterial genomes (which are generally 80–90% [[Coding region|coding]]) genes are arranged in such a fashion that approximately 50% of the coding sequence lies on either strand. [[Wacław Szybalski]], in the 1960s, showed that in [[bacteriophage]] coding sequences [[purines]] (A and G) exceed [[pyrimidines]] (C and T).<ref name=Szybalski1966>{{cite journal |vauthors=Szybalski W, Kubinski H, Sheldrick O |year=1966 |title=Pyrimidine clusters on the transcribing strand of DNA and their possible role in the initiation of RNA synthesis |journal=Cold Spring Harb Symp Quant Biol |volume=31 |pages=123–127 |pmid=4966069 |doi=10.1101/SQB.1966.031.01.019}}</ref> This rule has since been confirmed in other organisms and should probably be now termed "[[Szybalski's rule]]". While Szybalski's rule generally holds, exceptions are known to exist.<ref name=Cristillo1998>{{cite book |author=Cristillo AD |year=1998 |title=Characterization of G0/G1 switch genes in cultured T lymphocytes |publisher=Queen's University |location=Kingston, Ontario, Canada}}</ref><ref name=Bell1999>{{cite journal |doi=10.1006/jtbi.1998.0858 |vauthors=Bell SJ, Forsdyke DR |year=1999 |title=Deviations from Chargaff's second parity rule correlate with direction of transcription |journal=J Theor Biol |volume=197 |issue=1 |pages=63–76 |pmid=10036208|bibcode=1999JThBi.197...63B }}</ref><ref name=Lao2000>{{cite journal |doi=10.1101/gr.10.2.228 |vauthors=Lao PJ, Forsdyke DR |year=2000 |title=Thermophilic Bacteria Strictly Obey Szybalski's Transcription Direction Rule and Politely Purine-Load RNAs with Both Adenine and Guanine |journal= Genome Research|volume=10 |issue=2 |pages=228–236 |pmid=10673280 |pmc=310832}}</ref> The biological basis for Szybalski's rule is not yet known.

The combined effect of Chargaff's second rule and Szybalski's rule can be seen in bacterial genomes where the coding sequences are not equally distributed. The [[genetic code]] has 64 [[codons]] of which 3 function as termination codons: there are only 20 [[amino acid]]s normally present in proteins. (There are two uncommon amino acids—[[selenocysteine]] and [[pyrrolysine]]—found in a limited number of proteins and encoded by the [[stop codon]]s—TGA and TAG respectively.) The mismatch between the number of codons and amino acids allows several codons to code for a single amino acid—such codons normally differ only at the third codon base position.

Multivariate statistical analysis of codon use within genomes with unequal quantities of coding sequences on the two strands has shown that codon use in the third position depends on the strand on which the gene is located. This seems likely to be the result of Szybalski's and Chargaff's rules. Because of the asymmetry in pyrimidine and purine use in coding sequences, the strand with the greater coding content will tend to have the greater number of purine bases (Szybalski's rule). Because the number of purine bases will, to a very good approximation, equal the number of their complementary pyrimidines within the same strand and, because the coding sequences occupy 80–90% of the strand, there appears to be (1) a selective pressure on the third base to minimize the number of purine bases in the strand with the greater coding content; and (2) that this pressure is proportional to the mismatch in the length of the coding sequences between the two strands.
[[File:Chargraff-2nd-6-mers.png|thumb|left|Chargaff's 2nd parity rule for prokaryotic 6-mers]]
The origin of the deviation from Chargaff's rule in the organelles has been suggested to be a consequence of the mechanism of replication.<ref name=Nikolaou2006>{{cite journal |doi=10.1016/j.gene.2006.06.010 |vauthors=Nikolaou C, Almirantis Y |year=2006 |title=Deviations from Chargaff's second parity rule in organellar DNA. Insights into the evolution of organellar genomes |journal=Gene |volume=381 |pages=34–41 |pmid=16893615}}</ref> During replication the DNA strands separate. In single stranded DNA, [[cytosine]] spontaneously slowly deaminates to [[adenosine]] (a C to A [[transversion]]). The longer the strands are separated the greater the quantity of deamination. For reasons that are not yet clear the strands tend to exist longer in single form in mitochondria than in chromosomal DNA. This process tends to yield one strand that is enriched in [[guanine]] (G) and [[thymine]] (T) with its complement enriched in cytosine (C) and adenosine (A), and this process may have given rise to the deviations found in the mitochondria. {{Citation needed|reason=reliable source needed for the whole sentence as there are faults in the judgement|date=January 2013}}{{Dubious|date=January 2013}}

Chargaff's second rule appears to be the consequence of a more complex parity rule: within a single strand of DNA any oligonucleotide ([[k-mer]] or [[n-gram]]; length ≤ 10) is present in equal numbers to its reverse complementary nucleotide. Because of the computational requirements this has not been verified in all genomes for all oligonucleotides. It has been verified for triplet oligonucleotides for a large data set.<ref name="Albrecht-Buehler2006">{{cite journal |doi=10.1073/pnas.0605553103 |author=Albrecht-Buehler G |year=2006 |title=Asymptotically increasing compliance of genomes with Chargaff's second parity rules through inversions and inverted transpositions |journal=Proc Natl Acad Sci USA |volume=103 |issue=47 |pages=17828–17833 |pmid=17093051 |pmc=1635160|bibcode=2006PNAS..10317828A |doi-access=free }}</ref> Albrecht-Buehler has suggested that this rule is the consequence of genomes evolving by a process of [[Chromosomal inversion|inversion]] and [[Transposon|transposition]].<ref name="Albrecht-Buehler2006"/> This process does not appear to have acted on the mitochondrial genomes. Chargaff's second parity rule appears to be extended from the nucleotide-level to populations of codon triplets, in the case of whole single-stranded Human genome DNA.<ref>{{cite journal |author= Perez, J.-C. |title= Codon populations in single-stranded whole human genome DNA are fractal and fine-tuned by the Golden Ratio 1.618 |journal= Interdisciplinary Sciences: Computational Life Sciences |date=September 2010 |volume= 2 |issue= 3 |pages= 228–240 |pmid= 20658335 |doi= 10.1007/s12539-010-0022-0 |s2cid= 54565279 }}</ref>
A kind of "codon-level second Chargaff's parity rule" is proposed as follows:

{{alternating rows table|class=wikitable sortable}}
|+ Intra-strand relation among percentages of codon populations
! scope=col | First codon !! scope=col | Second codon !! scope=col | Relation proposed !! scope=col | Details
|-
| <code>Twx</code> (1st base position is T) || <code>yzA</code> (3rd base position is A) || % <code>Twx</code> <math> \simeq </math> % <code>yzA</code> || <code>Twx</code> and <code>yzA</code> are mirror codons, e.g. <code>TCG</code> and <code>CGA</code>
|-
| <code>Cwx</code> (1st base position is C) || <code>yzG</code> (3rd base position is G) || % <code>Cwx</code> <math> \simeq </math> % <code>yzG</code> || <code>Cwx</code> and <code>yzG</code> are mirror codons, e.g. <code>CTA</code> and <code>TAG</code>
|-
| <code>wTx</code> (2nd base position is T) || <code>yAz</code> (2nd base position is A) || % <code>wTx</code> <math> \simeq </math> % <code>yAz</code> || <code>wTx</code> and <code>yAz</code> are mirror codons, e.g. <code>CTG</code> and <code>CAG</code>
|-
| <code>wCx</code> (2nd base position is C) || <code>yGz</code> (2nd base position is G) || % <code>wCx</code> <math> \simeq </math> % <code>yGz</code> || <code>wCx</code> and <code>yGz</code> are mirror codons, e.g. <code>TCT</code> and <code>AGA</code>
|-
| <code>wxT</code> (3rd base position is T) || <code>Ayz</code> (1st base position is A) || % <code>wxT</code> <math> \simeq </math> % <code>Ayz</code> || <code>wxT</code> and <code>Ayz</code> are mirror codons, e.g. <code>CTT</code> and <code>AAG</code>
|-
| <code>wxC</code> (3rd base position is C) || <code>Gyz</code> (1st base position is G) || % <code>wxC</code> <math> \simeq </math> % <code>Gyz</code> || <code>wxC</code> and <code>Gyz</code> are mirror codons, e.g. <code>GGC</code> and <code>GCC</code>
|-
|}

Examples — computing whole human genome using the first codons reading frame provides:
 36530115 TTT and 36381293 AAA (ratio % = 1.00409). 2087242 TCG and 2085226 CGA (ratio % = 1.00096), etc...

In 2020, it is suggested that the physical properties of the dsDNA (double stranded DNA) and the tendency to maximum entropy of all the physical systems are the cause of Chargaff's second parity rule.<ref>{{cite journal |author= Piero Farisell, Cristian Taccioli, Luca Pagani & Amos Maritan |title= DNA sequence symmetries from randomness: the origin of the Chargaff's second parity rule |journal= Briefings in Bioinformatics |date=April 2020 |volume= 22 |issue= bbaa04 |pages= 2172–2181 |pmid= 32266404 |doi= 10.1093/bib/bbaa041 |pmc= 7986665 |doi-access= free }}</ref> The symmetries and patterns present in the dsDNA sequences can emerge from the physical peculiarities of the dsDNA molecule and the maximum entropy principle alone, rather than from biological or environmental evolutionary pressure.

== Percentages of bases in DNA ==
The following table is a representative sample of Erwin Chargaff's 1952 data, listing the base composition of DNA from various organisms and support both of Chargaff's rules.<ref name="Bansal2003">{{cite journal |author=Bansal M |title=DNA structure: Revisiting the Watson-Crick double helix |journal=Current Science |volume=85 |issue=11 |pages=1556–1563 |year=2003 |url=http://eprints.iisc.ernet.in/7173/1/dna.pdf |access-date=2013-07-26 |archive-url=https://web.archive.org/web/20140726025011/http://eprints.iisc.ernet.in/7173/1/dna.pdf |archive-date=2014-07-26 |url-status=dead }}</ref> An organism such as φX174 with significant variation from A/T and G/C equal to one, is indicative of single stranded DNA.

{{alternating rows table|class=wikitable sortable}}
! scope=col|Organism!!scope=col|Taxon!!scope=col|%A !!scope=col|%G !!scope=col|%C !!scope=col|%T !!scope=col|A / T !!scope=col|G / C !!scope=col|%GC !!scope=col|%AT
|-
| [[Maize]] || ''[[Zea (plant)|Zea]]'' || 26.8 || 22.8 || 23.2 || 27.2 || 0.99 || 0.98 || 46.1 || 54.0
|-
| [[Octopus]] || ''[[Octopus]]'' || 33.2 || 17.6 || 17.6 || 31.6 || 1.05 || 1.00 || 35.2 || 64.8
|-
| [[Chicken]] || ''[[Gallus (genus)|Gallus]]'' || 28.0 || 22.0 || 21.6 || 28.4 || 0.99 || 1.02 || 43.7 || 56.4
|-
| [[Rat]] || ''[[Rattus]]'' || 28.6 || 21.4 || 20.5 || 28.4 || 1.01 || 1.00 || 42.9 || 57.0
|-
| [[Human]] || ''[[Homo]]'' || 29.3 || 20.7 || 20.0 || 30.0 || 0.98 || 1.04 || 40.7 || 59.3
|-
| [[Grasshopper]] || [[Orthoptera]] || 29.3 || 20.5 || 20.7 || 29.3 || 1.00 || 0.99 || 41.2 || 58.6
|-
| [[Sea urchin]] || [[Echinoidea]] || 32.8 || 17.7 || 17.3 || 32.1 || 1.02 || 1.02 || 35.0 || 64.9
|-
| [[Wheat]] || ''[[Triticum]]'' || 27.3 || 22.7 || 22.8 || 27.1 || 1.01 || 1.00 || 45.5 || 54.4
|-
| [[Yeast]] || ''[[Saccharomyces]]'' || 31.3 || 18.7 || 17.1 || 32.9 || 0.95 || 1.09 || 35.8 || 64.4
|-
| ''[[Escherichia coli|E. coli]]'' || ''[[Escherichia]]'' || 24.7 || 26.0 || 25.7 || 23.6 || 1.05 || 1.01 || 51.7 || 48.3
|-
| [[φX174]] || ''[[PhiX174]]'' || 24.0 || 23.3 || 21.5 || 31.2 || 0.77 || 1.08 || 44.8 || 55.2
|-
{{end}}

== See also ==
* [[Genetic codes]]

== References ==
{{reflist}}

== Further reading ==
* {{cite journal |vauthors=Szybalski W, Kubinski H, Sheldrick P |year=1966 |title=Pyrimidine clusters on the transcribing strands of DNA and their possible role in the initiation of RNA synthesis |journal= Cold Spring Harbor Symposia on Quantitative Biology|volume=31 |pages=123–127 |doi=10.1101/SQB.1966.031.01.019 |pmid=4966069 |ref=none }}
* {{cite journal |author=Lobry JR |year=1996 |title=Asymmetric substitution patterns in the two DNA strands of bacteria |journal=Mol. Biol. Evol. |volume=13 |issue=5 |pages=660–665 |pmid=8676740 |doi=10.1093/oxfordjournals.molbev.a025626|doi-access=free  |ref=none }}
* {{cite journal |doi=10.1093/nar/27.7.1642 |vauthors=Lafay B, Lloyd AT, McLean MJ, Devine KM, Sharp PM, Wolfe KH |year=1999 |title=Proteome composition and codon usage in spirochaetes: species-specific and DNA strand-specific mutational biases |journal=Nucleic Acids Res |volume=27 |issue=7 |pages=1642–1649 |pmid=10075995 |pmc=148367 |ref=none }}
* {{cite journal |doi=10.1007/PL00006428 |vauthors=McLean MJ, Wolfe KH, Devine KM |year=1998 |title=Base composition skews, replication orientation, and gene orientation in 12 prokaryote genomes |journal=J Mol Evol |volume=47 |pages=691–696 |pmid=9847411 |issue=6|citeseerx=10.1.1.28.9035 |bibcode=1998JMolE..47..691M |s2cid=12917481  |ref=none }}
* {{cite journal |doi=10.1073/pnas.95.18.10698 |author=McInerney JO |year=1998 |title=Replicational and transcriptional selection on codon usage in Borrelia burgdorferi |journal=Proc Natl Acad Sci USA |volume=95 |issue=18 |pages=10698–10703 |pmid=9724767 |pmc=27958|bibcode=1998PNAS...9510698M |doi-access=free  |ref=none }}

== External links ==
* [http://www.cbs.dtu.dk/services/GenomeAtlas/ CBS Genome Atlas Database] {{Webarchive|url=http://arquivo.pt/wayback/20160516135600/http://www.cbs.dtu.dk/services/GenomeAtlas |date=2016-05-16 }} — contains hundreds of examples of base skews and had problems.<ref name="Hallin2004">{{cite journal |doi=10.1093/bioinformatics/bth423 |vauthors=Hallin PF, David Ussery D |year=2004 |title=CBS Genome Atlas Database: A dynamic storage for bioinformatic results and sequence data |journal=Bioinformatics |volume=20 |issue=18 |pages=3682–3686 |pmid=15256401|doi-access=free }}</ref>
* [https://archive.today/20121129003619/http://tubic.tju.edu.cn/zcurve/ The Z curve database of genomes] — a 3-dimensional visualization and analysis tool of genomes.<ref name="Zhang2003_externallinks">{{cite journal |vauthors=Zhang CT, Zhang R, Ou HY |year=2003 |title=The Z curve database: a graphic representation of genome sequences |journal=Bioinformatics |volume=19 |issue=5 |pages=593–599 |doi=10.1093/bioinformatics/btg041 |pmid=12651717|doi-access=free }}</ref>

[[Category:DNA]]
[[Category:Genetics techniques]]
[[Category:History of genetics]]
[[Category:Biotechnology]]
[[Category:Medical research]]
[[Category:Biology experiments]]
[[Category:Laboratory techniques]]
[[Category:Biological engineering| ]]
[[Category:Molecular biology]]