Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
GC-content
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
{{Short description|Percentage of guanine and cytosine in DNA or RNA molecules}} [[Image:AT-GC.jpg|thumb|400px|Nucleotide bonds showing AT and GC pairs. Arrows point to the [[hydrogen bond]]s.]] In [[molecular biology]] and [[genetics]], '''GC-content''' (or '''guanine-cytosine content''') is the percentage of [[nitrogenous bases]] in a [[DNA]] or [[RNA]] molecule that are either [[guanine]] (G) or [[cytosine]] (C).<ref>[http://cancerweb.ncl.ac.uk/cgi-bin/omd?GC+content Definition of GC] β content on CancerWeb of [[Newcastle University]], UK</ref> This measure indicates the proportion of G and C bases out of an implied four total bases, also including [[adenine]] and [[thymine]] in DNA and adenine and [[uracil]] in RNA. GC-content may be given for a certain fragment of DNA or RNA or for an entire [[genome]]. When it refers to a fragment, it may denote the GC-content of an individual [[gene]] or section of a gene (domain), a group of genes or gene clusters, a [[non-coding DNA|non-coding region]], or a synthetic [[oligonucleotide]] such as a [[primer (molecular biology)|primer]]. ==Structure== Qualitatively, guanine (G) and cytosine (C) undergo a specific [[hydrogen bonding]] with each other, whereas adenine (A) bonds specifically with thymine (T) in DNA and with uracil (U) in RNA. Quantitatively, each GC [[base pair]] is held together by three hydrogen bonds, while AT and AU base pairs are held together by two hydrogen bonds. To emphasize this difference, the base pairings are often represented as "Gβ‘C" versus "A=T" or "A=U". DNA with low GC-content is less stable than DNA with high GC-content; however, the hydrogen bonds themselves do not have a particularly significant impact on molecular stability, which is instead caused mainly by molecular interactions of base stacking.<ref name ="Yakovchuk2006">{{cite journal |vauthors=Yakovchuk P, Protozanova E, Frank-Kamenetskii MD |title=Base-stacking and base-pairing contributions into thermal stability of the DNA double helix |journal=Nucleic Acids Res. |volume=34 |issue=2 |pages=564β74 |year=2006 |pmid=16449200 |pmc=1360284 |doi=10.1093/nar/gkj454 |url=}}</ref> In spite of the higher [[thermostability]] conferred to a nucleic acid with high GC-content, it has been observed that at least some species of [[bacteria]] with DNA of high GC-content undergo [[Autolysis (biology)|autolysis]] more readily, thereby reducing the longevity of the cell ''per se''.<ref>{{cite journal |vauthors=Levin RE, Van Sickle C |title=Autolysis of high-GC isolates of Pseudomonas putrefaciens |journal=Antonie van Leeuwenhoek |volume=42 |issue=1β2 |pages=145β55 |year=1976 |pmid=7999 |doi=10.1007/BF00399459 |s2cid=9960732 }}</ref> Because of the thermostability of GC pairs, it was once presumed that high GC-content was a necessary [[adaptation]] to high temperatures, but this hypothesis was refuted in 2001.<ref name="Hurst2001">{{cite journal |vauthors=Hurst LD, Merchant AR |title = High guanine-cytosine content is not an adaptation to high temperature: a comparative analysis amongst prokaryotes|journal = Proc. Biol. Sci.|volume = 268|issue = 1466|pages = 493β7|date = March 2001|pmid = 11296861|pmc = 1088632|doi = 10.1098/rspb.2000.1397}}</ref> Even so, it has been shown that there is a strong correlation between the optimal growth of [[prokaryote]]s at higher temperatures and the GC-content of structural RNAs such as [[ribosomal RNA]], [[transfer RNA]], and many other [[non-coding RNA]]s.<ref name="Hurst2001" /><ref>{{cite journal | last1=Galtier | first1=N. | last2=Lobry | first2=J.R. | title=Relationships between genomic G+C content, RNA secondary structures, and optimal growth temperature in Prokaryotes | journal=Journal of Molecular Evolution| volume=44 | pages=632β636 | year=1997 | pmid=9169555 | doi=10.1007/PL00006186 | issue=6 | bibcode=1997JMolE..44..632G| s2cid=19054315 }}</ref> The AU base pairs are less stable than the GC base pairs, making high-GC-content RNA structures more resistant to the effects of high temperatures. More recently, it has been demonstrated that the most important factor contributing to the thermal stability of double-stranded nucleic acids is actually due to the base stackings of adjacent bases rather than the number of hydrogen bonds between the bases. There is more favorable stacking energy for GC pairs than for AT or AU pairs because of the relative positions of exocyclic groups. Additionally, there is a correlation between the order in which the bases stack and the thermal stability of the molecule as a whole.<ref>{{Cite journal|last1=Yakovchuk|first1=Peter|last2=Protozanova|first2=Ekaterina|last3=Frank-Kamenetskii|first3=Maxim D.|date=2006|title=Base-stacking and base-pairing contributions into thermal stability of the DNA double helix|journal=Nucleic Acids Research|volume=34|issue=2|pages=564β574|doi=10.1093/nar/gkj454|issn=0305-1048|pmc=1360284|pmid=16449200}}</ref> ==Determination== [[File:Human karyotype with bands and sub-bands.png|thumb|Schematic [[karyotype|karyogram]] of a human, showing an overview of the [[human genome]] on [[G banding]] (which includes [[Giemsa-stain]]ing), wherein GC rich regions are lighter and GC poor regions are darker.<br />{{further|Karyotype}}]] GC-content is usually expressed as a percentage value, but sometimes as a ratio (called '''G+C ratio''' or '''GC-ratio'''). GC-content percentage is calculated as<ref>{{cite book | author =Madigan, MT. and Martinko JM. | title = Brock biology of microorganisms| edition = 10th | publisher =Pearson-Prentice Hall | year = 2003| isbn = 978-84-205-3679-8}}</ref> :<math>\cfrac{G+C}{A+T+G+C}\times100%</math> whereas the AT/GC ratio is calculated as<ref>{{Cite web |url=http://www.biochem.northwestern.edu/holmgren/Glossary/Definitions/Def-A/A+T_G+C_ratio.html |title=Definition of GC-ratio on Northwestern University, IL, USA |access-date=11 June 2007 |archive-date=20 June 2010 |archive-url=https://web.archive.org/web/20100620045958/http://www.biochem.northwestern.edu/holmgren/Glossary/Definitions/Def-A/A+T_G+C_ratio.html }}</ref> :<math>\cfrac{A+T}{G+C}</math> . The GC-content percentages as well as GC-ratio can be measured by several means, but one of the simplest methods is to measure the [[DNA melting|melting temperature]] of the DNA [[double helix]] using [[spectrophotometry]]. The [[absorbance]] of DNA at a [[wavelength]] of 260 [[Nanometer|nm]] increases fairly sharply when the double-stranded DNA molecule separates into two single strands when sufficiently heated.<ref>{{cite journal |vauthors=Wilhelm J, Pingoud A, Hahn M |title=Real-time PCR-based method for the estimation of genome sizes |journal=Nucleic Acids Res. |volume=31 |issue=10 |page=e56 |date=May 2003 |pmid=12736322 |pmc=156059 |doi=10.1093/nar/gng056}}</ref> The most commonly used protocol for determining GC-ratios uses [[flow cytometry]] for large numbers of samples.<ref>{{cite journal |author=Vinogradov AE |title=Measurement by flow cytometry of genomic AT/GC ratio and genome size |journal=Cytometry |volume=16 |issue=1 |pages=34β40 |date=May 1994 |pmid=7518377 |doi=10.1002/cyto.990160106 |doi-access=free }}</ref> In an alternative manner, if the DNA or RNA molecule under investigation has been reliably [[sequenced]], then GC-content can be accurately calculated by simple arithmetic or by using a variety of publicly available software tools, such as the [http://www.basic.northwestern.edu/biotools/oligocalc.html free online GC calculator]. ==Genomic content== === Within-genome variation === The GC-ratio within a genome is found to be markedly variable. These variations in GC-ratio within the genomes of more complex organisms result in a mosaic-like formation with islet regions called [[Isochore (genetics)|isochores]].<ref>{{cite journal |author=Bernardi G |title=Isochores and the evolutionary genomics of vertebrates |journal=Gene |volume=241 |issue=1 |pages=3β17 |date=January 2000 |pmid=10607893 |doi=10.1016/S0378-1119(99)00485-0}}</ref> This results in the variations in staining intensity in [[chromosomes]].<ref>{{cite journal |vauthors=Furey TS, Haussler D |title=Integration of the cytogenetic map with the draft human genome sequence |journal=Hum. Mol. Genet. |volume=12 |issue=9 |pages=1037β44 |date=May 2003 |pmid=12700172 |url=http://hmg.oxfordjournals.org/cgi/pmidlookup?view=long&pmid=12700172 |doi=10.1093/hmg/ddg113|doi-access=free }}</ref> GC-rich isochores typically include many protein-coding genes within them, and thus determination of GC-ratios of these specific regions contributes to [[gene mapping|mapping]] gene-rich regions of the genome.<ref>{{cite journal |vauthors=Sumner AT, de la Torre J, Stuppia L |title=The distribution of genes on chromosomes: a cytological approach |journal=J. Mol. Evol. |volume=37 |issue=2 |pages=117β22 |date=August 1993 |pmid=8411200 |doi=10.1007/BF02407346 |bibcode=1993JMolE..37..117S |s2cid=24677431 }}</ref><ref>{{cite journal |vauthors=AΓ―ssani B, Bernardi G |title=CpG islands, genes and isochores in the genomes of vertebrates |journal=Gene |volume=106 |issue=2 |pages=185β95 |date=October 1991 |pmid=1937049 |doi=10.1016/0378-1119(91)90198-K }}</ref> === Coding sequences === Within a long region of genomic sequence, genes are often characterised by having a higher GC-content in contrast to the background GC-content for the entire genome.<ref name="pmid28261263">{{cite journal| author=Romiguier J, Roux C| title=Analytical Biases Associated with GC-Content in Molecular Evolution. | journal=Front Genet | year= 2017 | volume= 8 | issue= | page= 16 | pmid=28261263 | doi=10.3389/fgene.2017.00016 | pmc=5309256 | doi-access=free }}</ref> There is evidence that the length of the [[coding region]] of a [[gene]] is directly proportional to higher G+C content.<ref>{{cite journal |vauthors=Pozzoli U, Menozzi G, Fumagalli M |title=Both selective and neutral processes drive GC content evolution in the human genome |journal=BMC Evol. Biol. |volume=8 |page=99 |year=2008 |issue=1 |pmid=18371205 |pmc=2292697 |doi=10.1186/1471-2148-8-99 |bibcode=2008BMCEE...8...99P |display-authors=etal |doi-access=free }}</ref> This has been pointed to the fact that the [[stop codon]] has a bias towards A and T nucleotides, and, thus, the shorter the sequence the higher the AT bias.<ref>{{cite journal |vauthors=Wuitschick JD, Karrer KM |title=Analysis of genomic G + C content, codon usage, initiator codon context and translation termination sites in ''Tetrahymena thermophila'' |journal=J. Eukaryot. Microbiol. |volume=46 |issue=3 |pages=239β47 |year=1999 |pmid=10377985 |doi=10.1111/j.1550-7408.1999.tb05120.x |s2cid=28836138 }}</ref> Comparison of more than 1,000 [[orthologous]] genes in mammals showed marked within-genome variations of the [[codon|third-codon position]] GC content, with a range from less than 30% to more than 80%.<ref name="Romiguier2010"/> === Among-genome variation === GC content is found to be variable with different organisms, the process of which is envisaged to be contributed to by variation in [[Gene-centered view of evolution|selection]], mutational bias, and biased recombination-associated [[DNA repair]].<ref>{{cite journal |author=Birdsell JA |title=Integrating genomics, bioinformatics, and classical genetics to study the effects of recombination on genome evolution |journal=Mol. Biol. Evol. |volume=19 |issue=7 |pages=1181β97 |date=1 July 2002|pmid=12082137 |url=http://mbe.oxfordjournals.org/cgi/pmidlookup?view=long&pmid=12082137 |doi=10.1093/oxfordjournals.molbev.a004176|citeseerx=10.1.1.337.1535 }}</ref> The average GC-content in human genomes ranges from 35% to 60% across 100-Kb fragments, with a mean of 41%.<ref name="IHSGC2001">{{cite journal |author=International Human Genome Sequencing Consortium | title = Initial sequencing and analysis of the human genome | journal = Nature | volume = 409 | issue = 6822 | pages = 860β921 | date = Feb 2001 | pmid = 11237011 | doi = 10.1038/35057062 | bibcode = 2001Natur.409..860L | doi-access = free | hdl = 2027.42/62798 | hdl-access = free }} (page 876)</ref><!--The "Romiguier2010" paper provides a mean GC level at the third codon position in genes. Since protein coding regions are massively overrepresented in GC-rich DNA, their number (46%) is much higher than the genomic mean.--> The GC-content of [[Yeast]] (''[[Saccharomyces cerevisiae]]'') is 38%,<ref>[https://www.ncbi.nlm.nih.gov/sites/entrez?db=genomeprj&cmd=Retrieve&dopt=Overview&list_uids=128 Whole genome data of ''Saccharomyces cerevisiae'' on NCBI]</ref> and that of another common [[model organism]], thale cress (''[[Arabidopsis thaliana]]''), is 36%.<ref>[https://www.ncbi.nlm.nih.gov/sites/entrez?db=genomeprj&cmd=Retrieve&dopt=Overview&list_uids=116 Whole genome data of '' Arabidopsis thaliana'' on NCBI]</ref> Because of the nature of the [[genetic code]], it is virtually impossible for an organism to have a genome with a GC-content approaching either 0% or 100%. However, a species with an extremely low GC-content is ''[[Plasmodium falciparum]]'' (GC% = ~20%),<ref>[https://www.ncbi.nlm.nih.gov/sites/entrez?db=genomeprj&cmd=Retrieve&dopt=Overview&list_uids=148 Whole genome data of ''Plasmodium falciparum'' on NCBI]</ref> and it is usually common to refer to such examples as being AT-rich instead of GC-poor.<ref>{{cite journal |vauthors=Musto H, CacciΓ² S, RodrΓguez-Maseda H, Bernardi G |title=Compositional constraints in the extremely GC-poor genome of ''Plasmodium falciparum'' |journal=Mem. Inst. Oswaldo Cruz |volume=92 |issue=6 |pages=835β41 |year=1997 |pmid=9566216 |url=http://www.scielo.br/pdf/mioc/v92n6/3431.pdf |doi=10.1590/S0074-02761997000600020|doi-access=free }}</ref> Several mammalian species (e.g., [[shrew]], [[microbat]], [[tenrec]], [[rabbit]]) have independently undergone a marked increase in the GC-content of their genes. These GC-content changes are correlated with species [[Phenotypic trait|life-history traits]] (e.g., body mass or longevity) and [[genome size]],<ref name="Romiguier2010">{{Cite journal|last1=Romiguier|first1=Jonathan|last2=Ranwez|first2=Vincent|last3=Douzery|first3=Emmanuel J. P.|last4=Galtier|first4=Nicolas|date=2010-08-01|title=Contrasting GC-content dynamics across 33 mammalian genomes: Relationship with life-history traits and chromosome sizes|journal=Genome Research|language=en|volume=20|issue=8|pages=1001β1009|doi=10.1101/gr.104372.109|issn=1088-9051|pmc=2909565|pmid=20530252}}</ref> and might be linked to a molecular phenomenon called the GC-biased [[gene conversion]].<ref name=Duret2009>{{cite journal |vauthors=Duret L, Galtier N |s2cid=9126286 |title=Biased gene conversion and the evolution of mammalian genomic landscapes |journal=Annu Rev Genom Hum Genet |volume=10 |pages=285β311 |year=2009 |pmid=19630562 |doi=10.1146/annurev-genom-082908-150001 }}</ref> ==Applications== === Molecular biology === In [[polymerase chain reaction]] (PCR) experiments, the GC-content of short oligonucleotides known as [[primer (molecular biology)|primers]] is often used to predict their [[polymerase chain reaction|annealing temperature]] to the template DNA. A higher GC-content level indicates a relatively higher melting temperature. Many sequencing technologies, such as [[Illumina sequencing]], have trouble reading high-GC-content sequences. [[Bird]] genomes are known to have many such parts, causing the problem of "missing genes" expected to be present from evolution and phenotype but never sequenced β until improved methods were used.<ref>{{cite journal|vauthors=Huttener R, Thorrez L, Veld TI |display-authors=etal|title=Sequencing refractory regions in bird genomes are hotspots for accelerated protein evolution|journal=BMC Ecol Evol|volume=21|issue=176|year=2021|page=176 |doi=10.1186/s12862-021-01905-7|pmid=34537008 |pmc=8449477 |doi-access=free}}</ref> === Systematics === The [[species problem]] in non-eukaryotic taxonomy has led to various suggestions in classifying bacteria, and the ''ad hoc committee on reconciliation of approaches to bacterial systematics'' of 1987 has recommended use of GC-ratios in higher-level hierarchical classification.<ref>{{cite journal |doi=10.1099/00207713-37-4-463 |author=Wayne LG |title=Report of the ad hoc committee on reconciliation of approaches to bacterial systematic |journal=International Journal of Systematic Bacteriology |volume=37 |issue=4 |pages=463β4 |year=1987|display-authors=etal|doi-access=free }}</ref> For example, the [[Actinomycetota]] are characterised as "high GC-content [[bacterium|bacteria]]".<ref>[https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Tree&id=1760&lvl=3&lin=f&keep=1&srchmode=1&unlock Taxonomy browser on NCBI]</ref> In ''[[Streptomyces coelicolor]]'' A3(2), GC-content is 72%.<ref>[https://www.ncbi.nlm.nih.gov/sites/entrez?db=genomeprj&cmd=Retrieve&dopt=Overview&list_uids=242 Whole genome data of ''Streptomyces coelicolor'' A3(2) on NCBI]</ref> With the use of more reliable, modern methods of molecular systematics, the GC-content definition of Actinomycetota has been abolished and low-GC bacteria of this [[clade]] have been found.<ref name="lowGCActinoacteria">{{cite journal |vauthors=Ghai R, McMahon KD, Rodriguez-Valera F |title=Breaking a paradigm: Cosmopolitan and abundant freshwater actinobacteria are low GC |journal=Environmental Microbiology Reports |volume=4 |issue=1 |pages=29β35 |year=2012 |doi=10.1111/j.1758-2229.2011.00274.x |pmid=23757226|bibcode=2012EnvMR...4...29G }}</ref> ==Software tools== GCSpeciesSorter<ref>{{cite journal |vauthors=Karimi K, Wuitchik D, Oldach M, Vize P |title=Distinguishing Species Using GC Contents in Mixed DNA or RNA Sequences |journal=Evol Bioinform Online |volume=14 |issue=January 1, 2018 |page=1176934318788866 |year=2018 |pmid=30038485 |pmc=6052495 |doi=10.1177/1176934318788866}}</ref> and TopSort<ref>{{cite journal |vauthors=Lehnert E, Mouchka M, Burriesci M, Gallo N, Schwarz J, Pringle J |title=Extensive differences in gene expression between symbiotic and aposymbiotic cnidarians |journal=G3 (Bethesda) |volume=4 |issue=2 |pages=277β95 |year=2014 |pmid=24368779 |pmc=3931562 |doi=10.1534/g3.113.009084}}</ref> are software tools for classifying species based on their GC-contents. ==See also== * [[Codon usage bias]] == References == {{Reflist|2}} == External links == # [http://insilico.ehu.es/oligoweb/index2.php?m=all Table with GC-content of all sequenced prokaryotes] # [https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Tree&id=2&lvl=3&srchmode=1&keep=1&unlock Taxonomic browser of bacteria based on GC ratio on NCBI website]. # [https://web.archive.org/web/20110809040326/http://esper.lab.nig.ac.jp/study/genome/?page=genome_composition_database_species_list GC ratio in diverse species]. {{Use dmy dates|date=August 2016}} {{DEFAULTSORT:Gc-Content}} [[Category:DNA]] [[Category:Molecular biology]] [[Category:Biological classification]]
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)
Pages transcluded onto the current version of this page
(
help
)
:
Template:Cite book
(
edit
)
Template:Cite journal
(
edit
)
Template:Cite web
(
edit
)
Template:Further
(
edit
)
Template:Reflist
(
edit
)
Template:Short description
(
edit
)
Template:Use dmy dates
(
edit
)