Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Consensus sequence
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
{{Short description|Most common variant of a genetic sequence across samples}}{{Technical|date=May 2023}} In [[molecular biology]] and [[bioinformatics]], the '''consensus sequence''' (or '''canonical sequence''') is the calculated sequence of most frequent residues, either [[nucleotide]] or [[amino acid]], found at each position in a [[sequence alignment]]. It represents the results of multiple [[sequence alignment]]s in which related sequences are compared to each other and similar [[sequence motifs]] are calculated. Such information is important when considering sequence-dependent enzymes such as [[RNA polymerase]].<ref>Pierce, Benjamin A. 2002. Genetics : A Conceptual Approach. 1st ed. New York: W.H. Freeman and Co.</ref> ==Biological significance== A protein binding site, represented by a consensus sequence, may be a short sequence of [[nucleotide]]s which is found several times in the [[genome]] and is thought to play the same role in its different locations. For example, many [[transcription factors]] recognize particular patterns in the [[Promoter (genetics)|promoters]] of the [[gene]]s they regulate. In the same way, [[restriction enzymes]] usually have [[palindromic]] consensus sequences, usually corresponding to the site where they cut the DNA. [[Transposons]] act in much the same manner in their identification of target sequences for transposition. Finally, [[splice site]]s (sequences immediately surrounding the [[exon]]-[[intron]] boundaries) can also be considered as consensus sequences. Thus a consensus sequence is a model for a putative [[DNA binding site]]: it is obtained by aligning all known examples of a certain recognition site and defined as the idealized sequence that represents the predominant base at each position. All the actual examples shouldn't differ from the consensus by more than a few substitutions, but counting mismatches in this way can lead to inconsistencies.<ref name="Schneider2002" /> Any mutation allowing a mutated nucleotide in the core promoter sequence to look more like the consensus sequence is known as an '''up mutation'''. This kind of mutation will generally make the promoter stronger, and thus the RNA polymerase forms a tighter bind to the DNA it wishes to transcribe and transcription is up-regulated. On the contrary, mutations that destroy conserved nucleotides in the consensus sequence are known as '''down mutations'''. These types of mutations down-regulate transcription since RNA polymerase can no longer bind as tightly to the core promoter sequence. ==Sequence analysis== Developing software for [[pattern recognition]] is a major topic in [[genetics]], [[molecular biology]], and [[bioinformatics]]. Specific [[sequence motif]]s can function as [[regulatory sequence]]s controlling biosynthesis, or as [[signal peptide|signal sequences]] that direct a molecule to a specific site within the cell or regulate its maturation. Since the regulatory function of these sequences is important, they are thought to be conserved across long periods of [[evolution]]. In some cases, evolutionary relatedness can be estimated by the amount of conservation of these sites. ===Notation=== The conserved sequence motifs are called '''consensus sequences''' and they show which residues are conserved and which residues are variable. Consider the following example [[DNA]] sequence: :A[CT]N{A}YR In this [[Sequence motif#Pattern description notations|notation]], A means that an A is always found in that position; [CT] stands for either C or T; N stands for any base; and {A} means any base except A. Y represents any [[pyrimidine]], and R indicates any [[purine]]. In this example, the notation [CT] does not give any indication of the relative frequency of C or T occurring at that position. And it is not possible to write it as a single consensus sequence e.g. ACNCCA. An alternative method of representing a consensus sequence uses a [[sequence logo]]. This is a graphical representation of the consensus sequence, in which the size of a symbol is related to the frequency that a given nucleotide (or amino acid) occurs at a certain position. In sequence logos the more conserved the residue, the larger the symbol for that residue is drawn; the less frequent, the smaller the symbol. Sequence logos can be generated using [http://weblogo.berkeley.edu/ WebLogo], or using the [http://db.systemsbiology.net/gestalt/ Gestalt Workbench], a publicly available visualization tool written by Gustavo Glusman at the [http://www.systemsbiology.org Institute for Systems Biology].<ref name="Schneider2002">{{cite journal |author=Schneider TD |title=Consensus Sequence Zen|journal=Appl Bioinform |volume=1 |issue=3 |pages=111β119 |year=2002 |pmid=15130839 |pmc=1852464}}</ref> ==Software== Bioinformatics tools are able to calculate and visualize consensus sequences. Examples of the tools are [[JalView]] and [[UGENE]]. == See also == * [[Position-specific scoring matrix]] * [[Regular expression]] — denoting multiple sequences of symbols in [[formal language]] theory * [[Sequence motif]] * [[Sequence logo]] ==References== {{Reflist}} [[Category:Bioinformatics]] [[Category:DNA]]
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)
Pages transcluded onto the current version of this page
(
help
)
:
Template:Ambox
(
edit
)
Template:Cite journal
(
edit
)
Template:Reflist
(
edit
)
Template:Short description
(
edit
)
Template:Technical
(
edit
)