Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Bioinformatics
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
===Genome annotation=== {{main|Gene prediction}} In [[genomics]], [[Genome project#Genome annotation|annotation]] refers to the process of marking the stop and start regions of genes and other biological features in a sequenced DNA sequence. Many genomes are too large to be annotated by hand. As the rate of [[DNA sequencing|sequencing]] exceeds the rate of genome annotation, genome annotation has become the new bottleneck in bioinformatics.{{When|date=June 2023}} Genome annotation can be classified into three levels: the [[nucleotide]], protein, and process levels. Gene finding is a chief aspect of nucleotide-level annotation. For complex genomes, a combination of [[ab initio]] gene prediction and sequence comparison with expressed sequence databases and other organisms can be successful. Nucleotide-level annotation also allows the integration of genome sequence with other genetic and physical maps of the genome. The principal aim of protein-level annotation is to assign function to the [[protein]] products of the genome. Databases of protein sequences and functional domains and motifs are used for this type of annotation. About half of the predicted proteins in a new genome sequence tend to have no obvious function. Understanding the function of genes and their products in the context of cellular and organismal physiology is the goal of process-level annotation. An obstacle of process-level annotation has been the inconsistency of terms used by different model systems. The Gene Ontology Consortium is helping to solve this problem.<ref>{{cite journal |title=Genome annotation: from sequence to biology |journal=Nature |year=2001 |doi=10.1038/35080529|last1=Stein |first1=Lincoln |volume=2 |issue=7 |pages=493β503 |pmid=11433356 |s2cid=12044602 }}</ref> The first description of a comprehensive annotation system was published in 1995<ref name="pmid7542800" /> by [[The Institute for Genomic Research]], which performed the first complete sequencing and analysis of the genome of a free-living (non-[[symbiotic]]) organism, the bacterium ''[[Haemophilus influenzae]]''.<ref name="pmid7542800" /> The system identifies the genes encoding all proteins, transfer RNAs, ribosomal RNAs, in order to make initial functional assignments. The [[GeneMark]] program trained to find protein-coding genes in ''[[Haemophilus influenzae]]'' is constantly changing and improving. Following the goals that the Human Genome Project left to achieve after its closure in 2003, the [[ENCODE]] project was developed by the [[National Human Genome Research Institute]]. This project is a collaborative data collection of the functional elements of the human genome that uses next-generation DNA-sequencing technologies and genomic tiling arrays, technologies able to automatically generate large amounts of data at a dramatically reduced per-base cost but with the same accuracy (base call error) and fidelity (assembly error). ==== Gene function prediction ==== While genome annotation is primarily based on sequence similarity (and thus [[Homology (biology)|homology]]), other properties of sequences can be used to predict the function of genes. In fact, most ''gene'' function prediction methods focus on ''protein'' sequences as they are more informative and more feature-rich. For instance, the distribution of hydrophobic [[amino acid]]s predicts [[Transmembrane domain|transmembrane segments]] in proteins. However, protein function prediction can also use external information such as gene (or protein) [[Gene expression|expression]] data, [[protein structure]], or [[protein-protein interactions]].<ref>{{cite journal |vauthors=Erdin S, Lisewski AM, Lichtarge O |title=Protein function prediction: towards integration of similarity metrics |journal=Current Opinion in Structural Biology |volume=21 |issue=2 |pages=180β8 |date=April 2011 |pmid=21353529 |pmc=3120633 |doi=10.1016/j.sbi.2011.02.001}}</ref>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)