Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Nucleic acid sequence
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
{{Short description|Succession of nucleotides in a nucleic acid}} {{more citations needed|date=March 2014}} {{DNA RNA structure}} A '''nucleic acid sequence''' is a succession of [[Nucleobase|bases]] within the [[nucleotides]] forming [[allele]]s within a [[DNA]] (using GACT) or [[RNA]] (GACU) molecule. This succession is denoted by a series of a set of five different letters that indicate the order of the nucleotides. By convention, sequences are usually presented from the [[Directionality (molecular biology)|5' end to the 3' end]]. For DNA, with its double helix, there are two possible directions for the notated sequence; of these two, the [[Sense (molecular biology)|sense strand]] is used. Because nucleic acids are normally linear (unbranched) [[polymers]], specifying the sequence is equivalent to defining the [[covalent]] structure of the entire molecule. For this reason, the nucleic acid sequence is also termed the [[Biomolecular structure#Primary structure|primary structure]]. The sequence represents '''genetic information'''. Biological [[deoxyribonucleic acid]] represents the [[information]] which directs the functions of an [[organism]]. Nucleic acids also have a [[Nucleic acid secondary structure|secondary structure]] and [[Nucleic acid tertiary structure|tertiary structure]]. Primary structure is sometimes mistakenly referred to as "primary sequence". However there is no parallel concept of secondary or tertiary sequence. == Nucleotides == [[File:RNA chemical structure.GIF|thumb|Chemical structure of RNA]] [[Image:RNA-codons.svg|thumb|A series of codons in part of a [[mRNA]] molecule. Each codon consists of three [[nucleotide]]s, usually representing a single [[amino acid]].]] {{Main|Nucleotide}} Nucleic acids consist of a chain of linked units called nucleotides. Each nucleotide consists of three subunits: a [[phosphate]] group and a [[sugar]] ([[ribose]] in the case of [[RNA]], [[deoxyribose]] in [[DNA]]) make up the backbone of the nucleic acid strand, and attached to the sugar is one of a set of [[nucleobase]]s. The nucleobases are important in [[base pair]]ing of strands to form higher-level [[Nucleic acid secondary structure|secondary]] and [[Nucleic acid tertiary structure|tertiary structures]] such as the famed [[Nucleic acid double helix|double helix]]. The possible letters are ''A'', ''C'', ''G'', and ''T'', representing the four [[nucleotide]] [[nucleobase|bases]] of a DNA strand – [[adenine]], [[cytosine]], [[guanine]], [[thymine]] – [[covalent]]ly linked to a [[phosphodiester bond|phosphodiester]] backbone. In the typical case, the sequences are printed abutting one another without gaps, as in the sequence AAAGTCTGAC, read left to right in the [[Directionality (molecular biology)|5' to 3']] direction. With regards to [[transcription (biology)|transcription]], a sequence is on the coding strand if it has the same order as the transcribed RNA. One sequence can be [[Complementarity (molecular biology)|complementary]] to another sequence, meaning that they have the base on each position in the complementary (i.e., A to T, C to G) and in the reverse order. For example, the complementary sequence to TTAC is GTAA. If one strand of the double-stranded DNA is considered the sense strand, then the other strand, considered the antisense strand, will have the complementary sequence to the sense strand. === Notation === {{Main|Nucleic acid notation}} While A, T, C, and G represent a particular nucleotide at a position, there are also letters that represent ambiguity which are used when more than one kind of nucleotide could occur at that position. The rules of the International Union of Pure and Applied Chemistry ([[IUPAC]]) are as follows:<ref name=":0">{{Cite journal |date=1986 |title=Nomenclature for incompletely specified bases in nucleic acid sequences. Recommendations 1984. Nomenclature Committee of the International Union of Biochemistry (NC-IUB). |journal=Proceedings of the National Academy of Sciences |language=en |volume=83 |issue=1 |pages=4–8 |doi=10.1073/pnas.83.1.4 |issn=0027-8424 |pmc=322779 |pmid=2417239 |doi-access=free}}</ref> For example, '''W''' means that either an adenine or a thymine could occur in that position without impairing the sequence's functionality. {| class="wikitable" style="margin-left:25px; margin-top:0px; text-align:center;" |+List of symbols ! Symbol<ref name="iupac">{{Cite web |last=Nomenclature Committee of the International Union of Biochemistry (NC-IUB) |year=1984 |title=Nomenclature for Incompletely Specified Bases in Nucleic Acid Sequences |url=http://www.chem.qmul.ac.uk/iubmb/misc/naseq.html |access-date=2008-02-04}}</ref> !! Meaning/derivation !!colspan=5| Possible bases|| Complement |- | '''A''' ||align=left| [[Adenine|'''A'''denine]] || A || || || ||rowspan=5| 1 || T (or U) |- | '''C''' ||align=left| [[Cytosine|'''C'''ytosine]] || || C || || || G |- | '''G''' ||align=left| [[Guanine|'''G'''uanine]] || || || G || || C |- | '''T''' ||align=left| [[Thymine|'''T'''hymine]] || || || || T || A |- | '''U''' ||align=left| [[Uracil|'''U'''racil]] || || || || U || A |- bgcolor=#e8e8e8 | '''W''' ||align=left| '''W'''eak || A || || || T ||rowspan=6| 2 || S |- bgcolor=#e8e8e8 | '''S''' ||align=left| '''S'''trong || || C || G || || W |- bgcolor=#e8e8e8 | '''M''' ||align=left| [[Amine|a'''M'''ino]] || A || C || || || K |- bgcolor=#e8e8e8 | '''K''' ||align=left| [[Ketone|'''K'''eto]] || || || G || T || M |- bgcolor=#e8e8e8 | '''R''' ||align=left| [[Purine|pu'''R'''ine]] || A || || G || || Y |- bgcolor=#e8e8e8 | '''Y''' ||align=left| [[Pyrimidine|p'''Y'''rimidine]] || || C || || T || R |- | '''B''' ||align=left| not A ('''B''' comes after A) || || C || G || T ||rowspan=4| 3 || V |- | '''D''' ||align=left| not C ('''D''' comes after C) || A || || G || T || H |- | '''H''' ||align=left| not G ('''H''' comes after G)|| A || C || || T || D |- | '''V''' ||align=left| not T ('''V''' comes after T and U) || A || C || G || || B |- bgcolor=#e8e8e8 | '''N''' ||align=left| any '''N'''ucleotide (not a gap) || A || C || G || T || 4 || N |- |'''Z''' ||align=left| [[0|'''Z'''ero]] || || || || || 0 || Z |} These symbols are also valid for RNA, except with U (uracil) replacing T (thymine).<ref name=":0" /> Apart from adenine (A), cytosine (C), guanine (G), thymine (T) and uracil (U), DNA and RNA also contain bases that have been modified after the nucleic acid chain has been formed. In DNA, the most common modified base is [[5-Methylcytidine|5-methylcytidine]] (m5C). In RNA, there are many modified bases, including [[pseudouridine]] (Ψ), [[dihydrouridine]] (D), [[inosine]] (I), [[ribothymidine]] (rT) and [[7-methylguanosine]] (m7G).<ref>{{Cite web |title=BIOL2060: Translation |url=https://www.mun.ca/biology/desmid/brian/BIOL2060/BIOL2060-22/CB22.html |website=mun.ca}}</ref><ref>{{Cite web |title=Research |url=http://www.biogeo.uw.edu.pl/research/grupaC_en.html |website=uw.edu.pl}}</ref> [[Hypoxanthine]] and [[xanthine]] are two of the many bases created through [[mutagen]] presence, both of them through deamination (replacement of the amine-group with a carbonyl-group). Hypoxanthine is produced from [[adenine]], and xanthine is produced from [[guanine]].<ref>{{Cite journal |last=Nguyen |first=T |last2=Brunson |first2=D |last3=Crespi |first3=C L |last4=Penman |first4=B W |last5=Wishnok |first5=J S |last6=Tannenbaum |first6=S R |date=April 1992 |title=DNA damage and mutation in human cells exposed to nitric oxide in vitro |journal=Proc Natl Acad Sci USA |volume=89 |issue=7 |pages=3030–034 |bibcode=1992PNAS...89.3030N |doi=10.1073/pnas.89.7.3030 |pmc=48797 |pmid=1557408 |doi-access=free}}</ref> Similarly, deamination of [[cytosine]] results in [[uracil]]. ;Example of comparing and determining the % difference between two nucleotide sequences * AA'''T'''CC'''GC'''TAG * AA'''A'''CC'''CT'''TAG Given the two 10-nucleotide sequences, line them up and compare the differences between them. Calculate the percent difference by taking the number of differences between the DNA bases divided by the total number of nucleotides. In this case there are three differences in the 10 nucleotide sequence. Thus there is a 30% difference. == Biological significance == [[File:Kooditabel.png|200px|thumb|A depiction of the [[genetic code]], by which the information contained in [[nucleic acid]]s are [[Translation (genetics)|translated]] into [[amino acid]] sequences in [[protein]]s.]] {{Further|Genetic code|Central dogma of molecular biology}} In biological systems, nucleic acids contain information which is used by a living [[Cell (biology)|cell]] to construct specific [[protein]]s. The sequence of [[nucleobase]]s on a nucleic acid strand is [[Translation (genetics)|translated]] by cell machinery into a sequence of [[amino acid]]s making up a protein strand. Each group of three bases, called a [[codon]], corresponds to a single amino acid, and there is a specific [[genetic code]] by which each possible combination of three bases corresponds to a specific amino acid. The [[central dogma of molecular biology]] outlines the mechanism by which proteins are constructed using information contained in nucleic acids. [[DNA]] is [[Transcription (genetics)|transcribed]] into [[mRNA]] molecules, which travel to the [[ribosome]] where the mRNA is used as a template for the construction of the protein strand. Since nucleic acids can bind to molecules with [[Complementarity (molecular biology)|complementary]] sequences, there is a distinction between "[[Sense (molecular biology)|sense]]" sequences which code for proteins, and the complementary "antisense" sequence, which is by itself nonfunctional, but can bind to the sense strand. == Sequence determination == [[Image:DNA sequence.svg|right|thumb|268px|[[Electropherogram]] printout from automated sequencer for determining part of a DNA sequence]] {{Main|DNA sequencing}} DNA sequencing is the process of determining the [[nucleotide]] sequence of a given [[DNA]] fragment. The sequence of the DNA of a living thing encodes the necessary information for that living thing to survive and reproduce. Therefore, determining the sequence is useful in fundamental research into why and how organisms live, as well as in applied subjects. Because of the importance of DNA to living things, knowledge of a DNA sequence may be useful in practically any biological [[research]]. For example, in [[medicine]] it can be used to identify, [[diagnosis|diagnose]] and potentially develop [[therapy|treatments]] for [[genetic disease]]s. Similarly, research into [[pathogens]] may lead to treatments for contagious diseases. [[Biotechnology]] is a burgeoning discipline, with the potential for many useful products and services. RNA is not sequenced directly. Instead, it is copied to a DNA by [[reverse transcriptase]], and this DNA is then sequenced. Current sequencing methods rely on the discriminatory ability of DNA polymerases, and therefore can only distinguish four bases. An inosine (created from adenosine during [[RNA editing]]) is read as a G, and 5-methyl-cytosine (created from cytosine by [[DNA methylation]]) is read as a C. With current technology, it is difficult to sequence small amounts of DNA, as the signal is too weak to measure. This is overcome by [[polymerase chain reaction]] (PCR) amplification. === Digital representation === [[File:AMY1gene.png|thumb|370px|Genetic sequence in digital format.]] Once a nucleic acid sequence has been obtained from an organism, it is stored ''[[in silico]]'' in digital format. Digital genetic sequences may be stored in [[sequence database]]s, be analyzed (see ''Sequence analysis'' below), be digitally altered and be used as templates for creating new actual DNA using [[artificial gene synthesis]]. == Sequence analysis == {{Main|Sequence analysis}} Digital genetic sequences may be analyzed using the tools of [[bioinformatics]] to attempt to determine its function. === Genetic testing === {{Main|Genetic testing}} The DNA in an organism's [[genome]] can be analyzed to [[medical diagnosis|diagnose]] vulnerabilities to inherited [[disease]]s, and can also be used to determine a child's paternity (genetic father) or a person's [[ancestry]]. Normally, every person carries two variations of every [[gene]], one inherited from their mother, the other inherited from their father. The [[human genome]] is believed to contain around 20,000–25,000 genes. In addition to studying [[chromosome]]s to the level of individual genes, genetic testing in a broader sense includes [[biochemical]] tests for the possible presence of [[genetic disease]]s, or mutant forms of genes associated with increased risk of developing genetic disorders. Genetic testing identifies changes in chromosomes, genes, or proteins.<ref>{{Cite web |date=16 March 2015 |title=What is genetic testing? |url=http://www.ghr.nlm.nih.gov/handbook/testing/genetictesting |url-status=dead |archive-url=https://web.archive.org/web/20060529002711/http://ghr.nlm.nih.gov/handbook/testing/genetictesting |archive-date=29 May 2006 |access-date=19 May 2010 |website=Genetics Home Reference}}</ref> Usually, testing is used to find changes that are associated with inherited disorders. The results of a genetic test can confirm or rule out a suspected genetic condition or help determine a person's chance of developing or passing on a genetic disorder. Several hundred genetic tests are currently in use, and more are being developed.<ref>{{Cite web |title=Genetic Testing |url=https://www.nlm.nih.gov/medlineplus/genetictesting.html |website=nih.gov}}</ref><ref>{{Cite web |date=2008-09-11 |title=Definitions of Genetic Testing |url=http://www.eurogentest.org/patient/public_health/info/public/unit3/DefinitionsGeneticTesting-3rdDraf18Jan07.xhtml |url-status=dead |archive-url=https://web.archive.org/web/20090204181251/http://eurogentest.org/patient/public_health/info/public/unit3/DefinitionsGeneticTesting-3rdDraf18Jan07.xhtml |archive-date=February 4, 2009 |access-date=2008-08-10 |website=Definitions of Genetic Testing (Jorge Sequeiros and Bárbara Guimarães) |publisher=EuroGentest Network of Excellence Project}}</ref> === Sequence alignment === {{Main|Sequence alignment}} In bioinformatics, a sequence alignment is a way of arranging the sequences of [[DNA]], [[RNA]], or [[protein]] to identify regions of similarity that may be due to functional, [[structural biology|structural]], or [[evolution]]ary relationships between the sequences.<ref name="mount">{{Cite book |last=Mount DM. |title=Bioinformatics: Sequence and Genome Analysis |publisher=Cold Spring Harbor Laboratory Press: Cold Spring Harbor, NY |year=2004 |isbn=0-87969-608-7 |edition=2nd}}</ref> If two sequences in an alignment share a common ancestor, mismatches can be interpreted as [[point mutation]]s and gaps as [[Insertion (genetics)|insertion]] or [[Deletion (genetics)|deletion mutations]] ([[indel]]s) introduced in one or both lineages in the time since they diverged from one another. In sequence alignments of proteins, the degree of similarity between [[amino acid]]s occupying a particular position in the sequence can be interpreted as a rough measure of how [[conservation (genetics)|conserved]] a particular region or [[sequence motif]] is among lineages. The absence of substitutions, or the presence of only very conservative substitutions (that is, the substitution of amino acids whose [[side chain]]s have similar biochemical properties) in a particular region of the sequence, suggest<ref name="predict">{{Cite journal |last=Ng |first=P. C. |last2=Henikoff |first2=S. |year=2001 |title=Predicting Deleterious Amino Acid Substitutions |journal=Genome Research |volume=11 |issue=5 |pages=863–74 |doi=10.1101/gr.176601 |pmc=311071 |pmid=11337480}}</ref> that this region has structural or functional importance. Although DNA and RNA [[nucleotide]] bases are more similar to each other than are amino acids, the conservation of base pairs can indicate a similar functional or structural role.<ref>{{Cite journal |last=Witzany |first=G |year=2016 |title=Crucial steps to life: From chemical reactions to code using agents |url=https://philpapers.org/rec/GUECST-2 |journal=Biosystems |volume=140 |pages=49–57 |bibcode=2016BiSys.140...49W |doi=10.1016/j.biosystems.2015.12.007 |pmid=26723230 |s2cid=30962295}}</ref> [[Computational phylogenetics]] makes extensive use of sequence alignments in the construction and interpretation of [[phylogenetic tree]]s, which are used to classify the evolutionary relationships between homologous genes represented in the genomes of divergent species. The degree to which sequences in a query set differ is qualitatively related to the sequences' evolutionary distance from one another. Roughly speaking, high sequence identity suggests that the sequences in question have a comparatively young [[most recent common ancestor]], while low identity suggests that the divergence is more ancient. This approximation, which reflects the "[[molecular clock]]" hypothesis that a roughly constant [[rate of evolution|rate of evolutionary change]] can be used to extrapolate the elapsed time since two genes first diverged (that is, the [[coalescence (genetics)|coalescence]] time), assumes that the effects of mutation and [[natural selection|selection]] are constant across sequence lineages. Therefore, it does not account for possible differences among organisms or species in the rates of [[DNA repair]] or the possible functional conservation of specific regions in a sequence. (In the case of nucleotide sequences, the molecular clock hypothesis in its most basic form also discounts the difference in acceptance rates between [[silent mutation]]s that do not alter the meaning of a given [[codon]] and other mutations that result in a different [[amino acid]] being incorporated into the protein.) More statistically accurate methods allow the evolutionary rate on each branch of the phylogenetic tree to vary, thus producing better estimates of coalescence times for genes. === Sequence motifs === {{Main|Sequence motif}} Frequently the primary structure encodes motifs that are of functional importance. Some examples of sequence motifs are: the C/D<!-- --><ref>{{Cite journal |last=Samarsky |first=DA |last2=Fournier MJ |last3=Singer RH |last4=Bertrand E |year=1998 |title=The snoRNA box C/D motif directs nucleolar targeting and also couples snoRNA synthesis and localization |journal=The EMBO Journal |volume=17 |issue=13 |pages=3747–57 |doi=10.1093/emboj/17.13.3747 |pmc=1170710 |pmid=9649444}}</ref> and H/ACA boxes<!-- --><ref>{{Cite journal |last=Ganot |first=Philippe |last2=Caizergues-Ferrer |first2=Michèle |last3=Kiss |first3=Tamás |date=1 April 1997 |title=The family of box ACA small nucleolar RNAs is defined by an evolutionarily conserved secondary structure and ubiquitous sequence elements essential for RNA accumulation |journal=[[Genes & Development]] |volume=11 |issue=7 |pages=941–56 |doi=10.1101/gad.11.7.941 |pmid=9106664 |doi-access=free}}</ref> of [[snoRNA]]s, [[LSm|Sm binding site]] found in spliceosomal RNAs such as [[U1 spliceosomal RNA|U1]], [[U2 spliceosomal RNA|U2]], [[U4 spliceosomal RNA|U4]], [[U5 spliceosomal RNA|U5]], [[U6 spliceosomal RNA|U6]], [[U12 minor spliceosomal RNA|U12]] and [[Small nucleolar RNA U3|U3]], the [[Shine-Dalgarno sequence]],<!-- --><ref>{{Cite journal |last=Shine J, Dalgarno L |author-link=John Shine |author-link2=Lynn Dalgarno |year=1975 |title=Determinant of cistron specificity in bacterial ribosomes |journal=Nature |volume=254 |issue=5495 |pages=34–38 |bibcode=1975Natur.254...34S |doi=10.1038/254034a0 |pmid=803646 |s2cid=4162567}}</ref> the [[Kozak consensus sequence]]<!-- --><ref name="Kozak1987">{{Cite journal |last=Kozak M |date=October 1987 |title=An analysis of 5'-noncoding sequences from 699 vertebrate messenger RNAs |journal=Nucleic Acids Res. |volume=15 |issue=20 |pages=8125–48 |doi=10.1093/nar/15.20.8125 |pmc=306349 |pmid=3313277}}</ref> and the [[RNA polymerase III|RNA polymerase III terminator]]<!-- -->.<ref name="pmid6263489">{{Cite journal |last=Bogenhagen DF, Brown DD |year=1981 |title=Nucleotide sequences in Xenopus 5S DNA required for transcription termination. |journal=Cell |volume=24 |issue=1 |pages=261–70 |doi=10.1016/0092-8674(81)90522-5 |pmid=6263489 |s2cid=9982829}}</ref> === Sequence entropy === In [[bioinformatics]], a sequence entropy, also known as sequence complexity or information profile,<ref name="glance">{{Cite journal |last=Pinho |first=A |last2=Garcia, S |last3=Pratas, D |last4=Ferreira, P |date=Nov 21, 2013 |title=DNA Sequences at a Glance. |journal=PLOS ONE |volume=8 |issue=11 |pages=e79922 |bibcode=2013PLoSO...879922P |doi=10.1371/journal.pone.0079922 |pmc=3836782 |pmid=24278218 |doi-access=free}}</ref> is a numerical sequence providing a quantitative measure of the local complexity of a DNA sequence, independently of the direction of processing. The manipulations of the information profiles enable the analysis of the sequences using alignment-free techniques, such as for example in motif and rearrangements detection.<ref name="glance" /><ref name="rearrang">{{Cite journal |last=Pratas |first=D |last2=Silva, R |last3=Pinho, A |last4=Ferreira, P |date=May 18, 2015 |title=An alignment-free method to find and visualise rearrangements between pairs of DNA sequences. |journal=Scientific Reports |volume=5 |pages=10203 |bibcode=2015NatSR...510203P |doi=10.1038/srep10203 |pmc=4434998 |pmid=25984837}}</ref><ref name="troy">{{Cite journal |last=Troyanskaya |first=O |last2=Arbell, O |last3=Koren, Y |last4=Landau, G |last5=Bolshoy, A |date=2002 |title=Sequence complexity profiles of prokaryotic genomic sequences: A fast algorithm for calculating linguistic complexity. |journal=Bioinformatics |volume=18 |pages=679–88 |doi=10.1093/bioinformatics/18.5.679 |pmid=12050064 |doi-access=free |number=5}}</ref> == See also == * [[Gene structure]] * [[Nucleic acid structure determination]] * [[Quaternary numeral system]] * [[Single-nucleotide polymorphism]] (SNP) == References == {{Reflist|2}} == External links == {{Commons category|Nucleic acid sequence}} * [https://web.archive.org/web/20080509072853/http://www.nslij-genetics.org/dnacorr/ A bibliography on features, patterns, correlations in DNA and protein texts] {{Biomolecular structure}} {{Authority control}} {{DEFAULTSORT:Nucleic Acid Sequence}} [[Category:DNA]] [[Category:Molecular biology]] [[Category:Nucleic acids]] [[Category:RNA]]
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)
Pages transcluded onto the current version of this page
(
help
)
:
Template:Authority control
(
edit
)
Template:Biomolecular structure
(
edit
)
Template:Cite book
(
edit
)
Template:Cite journal
(
edit
)
Template:Cite web
(
edit
)
Template:Commons category
(
edit
)
Template:DNA RNA structure
(
edit
)
Template:Further
(
edit
)
Template:Main
(
edit
)
Template:More citations needed
(
edit
)
Template:Reflist
(
edit
)
Template:Short description
(
edit
)