Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Genome project
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
{{Short description|Scientific endeavours to determine the complete genome sequence of an organism}} [[File:The Genome sequence when printed fills a huge book of close print.JPG|thumb|upright=1.5|When printed, the human genome sequence fills around 100 huge books of close print]] '''Genome projects''' are [[scientific]] endeavours that ultimately aim to determine the complete [[genome]] sequence of an [[organism]] (be it an [[animal]], a [[plant]], a [[fungus]], a [[bacterium]], an [[archaea]]n, a [[protist]] or a [[virus]]) and to annotate protein-coding [[gene]]s and other important genome-encoded features.<ref name='pevsner2009'>{{Cite book | edition = 2nd | publisher = Wiley-Blackwell | isbn = 9780470085851 | last = Pevsner | first = Jonathan | title = Bioinformatics and functional genomics | location = Hoboken, N.J | year = 2009 | url-access = registration | url = https://archive.org/details/bioinformaticsfu00pevs_0 }}</ref> The genome sequence of an organism includes the collective [[DNA]] sequences of each [[chromosome]] in the organism. For a [[bacteria|bacterium]] containing a single chromosome, a genome project will aim to map the sequence of that chromosome. For the human species, whose genome includes 22 pairs of [[autosome]]s and 2 sex chromosomes, a complete genome sequence will involve 46 separate chromosome sequences. The [[Human Genome Project]] is a well known example of a genome project.<ref name="doe2009"/> ==Genome assembly== <!-- This section is linked from [[Genetics]] --> {{main|Sequence assembly}} Genome assembly refers to the process of taking a large number of short [[DNA sequence]]s and reassembling them to create a representation of the original [[chromosome]]s from which the DNA originated. In a [[shotgun sequencing]] project, all the DNA from a source (usually a single [[organism]], anything from a [[bacterium]] to a [[mammal]]) is first fractured into millions of small pieces. These pieces are then "read" by automated sequencing machines. A genome assembly [[algorithm]] works by taking all the pieces and aligning them to one another, and detecting all places where two of the short sequences, or ''reads'', overlap. These overlapping reads can be merged, and the process continues. Genome assembly is a very difficult [[computational biology|computational]] problem, made more difficult because many genomes contain large numbers of identical sequences, known as [[Repeated sequence (DNA)|repeats]]. These repeats can be thousands of nucleotides long, and occur different locations, especially in the large genomes of [[plant]]s and [[animal]]s. The resulting (draft) genome sequence is produced by combining the information sequenced [[contig]]s and then employing linking information to create scaffolds. Scaffolds are positioned along the [[gene mapping#Physical Mapping|physical map]] of the chromosomes creating a "golden path". ===Assembly software=== Originally, most large-scale DNA sequencing centers developed their own software for assembling the sequences that they produced. However, this has changed as the software has grown more complex and as the number of sequencing centers has increased. An example of such [[Sequence assembly#Programs|assembler]] ''Short Oligonucleotide Analysis Package'' developed by [[Beijing Genomics Institute|BGI]] for de novo assembly of human-sized genomes, alignment, [[Single-nucleotide polymorphism|SNP]] detection, resequencing, indel finding, and structural variation analysis.<ref name="li2010"/><ref name="ReferenceA"/><ref name="wang2008"/> ==Genome annotation== {{main|DNA annotation}} Since the 1980s, [[molecular biology]] and [[bioinformatics]] have created the need for [[DNA annotation]]. DNA annotation or genome annotation is the process of identifying attaching biological information to [[DNA sequence|sequences]], and particularly in identifying the locations of genes and determining what those genes do. ==Time of completion== When [[Genome sequencing|sequencing]] a genome, there are usually regions that are difficult to sequence (often regions with highly [[repetitive DNA]]). Thus, 'completed' genome sequences are rarely ever complete, and terms such as 'working draft' or 'essentially complete' have been used to more accurately describe the status of such genome projects. Even when every [[base pair]] of a genome sequence has been determined, there are still likely to be errors present because DNA sequencing is not a completely accurate process. It could also be argued that a complete genome project should include the sequences of [[mitochondria]] and (for plants) [[chloroplasts]] as these [[organelles]] have their own genomes. It is often reported that the goal of sequencing a genome is to obtain information about the complete set of [[genes]] in that particular genome sequence. The proportion of a genome that encodes for genes may be very small (particularly in [[eukaryotes]] such as humans, where [[coding region|coding DNA]] may only account for a few percent of the entire sequence). However, it is not always possible (or desirable) to only sequence the [[coding region]]s separately. Also, as scientists understand more about the role of this [[noncoding DNA]] (often referred to as [[junk DNA]]), it will become more important to have a complete genome sequence as a background to understanding the genetics and biology of any given organism. In many ways genome projects do not confine themselves to only determining a DNA sequence of an organism. Such projects may also include [[gene prediction]] to find out where the genes are in a genome, and what those genes do. There may also be related projects to sequence [[Expressed sequence tag|ESTs]] or [[mRNA]]s to help find out where the genes actually are. ==Historical and technological perspectives== Historically, when sequencing eukaryotic genomes (such as the worm ''[[Caenorhabditis elegans]]'') it was common to first [[Gene mapping|map]] the genome to provide a series of landmarks across the genome. Rather than sequence a chromosome in one go, it would be sequenced piece by piece (with the prior knowledge of approximately where that piece is located on the larger chromosome). Changes in technology and in particular improvements to the processing power of computers, means that genomes can now be '[[Whole genome shotgun sequencing|shotgun sequenced]]' in one go (there are caveats to this approach though when compared to the traditional approach). Improvements in [[DNA sequencing]] technology have meant that the cost of sequencing a new genome sequence has steadily fallen (in terms of cost per [[base pair]]) and newer technology has also meant that genomes can be sequenced far more quickly. When research agencies decide what new genomes to sequence, the emphasis has been on species which are either high importance as [[model organism]] or have a relevance to human health (e.g. pathogenic [[bacteria]] or [[Vector (epidemiology)|vectors]] of disease such as [[mosquito]]s) or species which have commercial importance (e.g. livestock and crop plants). Secondary emphasis is placed on species whose genomes will help answer important questions in [[molecular evolution]] (e.g. the [[common chimpanzee]]). In the future, it is likely that it will become even cheaper and quicker to sequence a genome. This will allow for complete genome sequences to be determined from many different individuals of the same species. For humans, this will allow us to better understand aspects of [[Human Genome Diversity Project|human genetic diversity]]. ==Examples== {{main|List of sequenced eukaryotic genomes|List of sequenced archaeal genomes|List of sequenced prokaryotic genomes}} [[File:Hereford67-300.jpg|thumb|L1 Dominette 01449, the Hereford who serves as the subject of the [[Bovine Genome Project]]]] [[File:Giant sequoias in Giant Sequoia National Monument.jpg|thumb|The Giant Sequoia genome sequence was extracted from a single fertilized seed harvested from a 1,360-year-old tree in [[Sequoia and Kings Canyon National Parks|Sequoia/Kings Canyon National Park]].]] Many organisms have genome projects that have either been completed or will be completed shortly, including: * [[Human]]s, ''Homo sapiens''; see [[Human genome project]] * Humans, ''Homo sapiens''; see [[Human Genome Project - Write|The Human Genome Project–Write]] * [[Palaeo-Eskimo]],<ref name="ReferenceA"/> an ancient-human * [[Neanderthal]], ''Homo sapiens neanderthalensis'' (partial); see [[Neanderthal Genome Project]] * [[Common chimpanzee]] ''Pan troglodytes''; see [[Chimpanzee Genome Project]] * [[Woolly mammoth]], ''Mammuthus primigenius''<ref>{{cite news |last1=Ghosh |first1=Pallab |title=Mammoth genome sequence completed |work=BBC News |date=23 April 2015 |url=https://www.bbc.co.uk/news/science-environment-32432693 }}</ref> * Domestic [[cow]],<ref name="cowpr"/><ref name='cowGenome'/> ''Bos taurus'' * [[Bovine genome]] * [[Honey Bee Genome Sequencing Consortium]] * [[Horse genome]]<ref>{{cite web|url=http://www.genome.gov/20519480|title=2007 Release: Horse Genome Assembled|website=National Human Genome Research Institute (NHGRI)|access-date=19 April 2018}}</ref> * [[HRDetect]] * [[Human microbiome project]] * [[International Grape Genome Program]] * [[International HapMap Project]] * Tomato 150+ genome resequencing project * [[100,000 Genomes Project]] * [[100K Pathogen Genome Project]] * International Mouse Phenotyping Consortium IMPC * Knockout Mouse Phenotyping Project KOMP2 * [[Sequoiadendron giganteum|Giant Sequoia]], ''Sequoiadendron giganteum''<ref>{{cite journal |last1=Scott |first1=Alison D |last2=Zimin |first2=Aleksey V |last3=Puiu |first3=Daniela |last4=Workman |first4=Rachael |last5=Britton |first5=Monica |last6=Zaman |first6=Sumaira |last7=Caballero |first7=Madison |last8=Read |first8=Andrew C |last9=Bogdanove |first9=Adam J |last10=Burns |first10=Emily |last11=Wegrzyn |first11=Jill |last12=Timp |first12=Winston |last13=Salzberg |first13=Steven L |last14=Neale |first14=David B |date= November 1, 2020 |title=A Reference Genome Sequence for Giant Sequoia |journal= G3: Genes, Genomes, Genetics |volume=10 |issue=11 |pages=3907–3919 |doi=10.1534/g3.120.401612 |pmid=32948606 |pmc=7642918 }}</ref> ==See also== * [[Joint Genome Institute]] * [[Illumina, Inc.|Illumina]], private company involved in genome sequencing * [[Knome]], private company offering genome analysis & sequencing * [[Model organism]] * [[National Center for Biotechnology Information]] ==References== {{Reflist|2|refs= <ref name=doe2009>{{cite web | url=http://www.ornl.gov/sci/techresources/Human_Genome/project/benefits.shtml | title=Potential Benefits of Human Genome Project Research | publisher=[[United States Department of Energy|Department of Energy]], Human Genome Project Information | date=2009-10-09 | access-date=2010-06-18 | archive-url=https://web.archive.org/web/20130708180000/http://www.ornl.gov/sci/techresources/Human_Genome/project/benefits.shtml | archive-date=2013-07-08 | url-status=dead }}</ref> <ref name='li2010'>{{Cite journal | doi = 10.1101/gr.097261.109 | issn = 1549-5469 | volume = 20 | issue = 2 | pages = 265–272 | vauthors=Li R, Zhu H, Ruan J, Qian W, Fang X, Shi Z, Li Y, Li S, Shan G, Kristiansen K, Li S, Yang H, Wang J, Wang J | title = De novo assembly of human genomes with massively parallel short read sequencing | journal = Genome Research | date = February 2010 | pmid=20019144 | pmc=2813482 }}</ref> <ref name="ReferenceA">{{Cite journal | doi = 10.1038/nature08835 | issn = 1476-4687 | volume = 463 | issue = 7282 | pages = 757–762 | vauthors=Rasmussen M, Li Y, Lindgreen S, Pedersen JS, Albrechtsen A, Moltke I, Metspalu M, Metspalu E, Kivisild T, Gupta R, Bertalan M, Nielsen K, Gilbert MT, Wang Y, Raghavan M, Campos PF, Kamp HM, Wilson AS, Gledhill A, Tridico S, Bunce M, Lorenzen ED, Binladen J, Guo X, Zhao J, Zhang X, Zhang H, Li Z, Chen M, Orlando L, Kristiansen K, Bak M, Tommerup N, Bendixen C, Pierre TL, Grønnow B, Meldgaard M, Andreasen C, Fedorova SA, Osipova LP, Higham TF, Ramsey CB, Hansen TV, Nielsen FC, Crawford MH, Brunak S, Sicheritz-Pontén T, Villems R, Nielsen R, Krogh A, Wang J, Willerslev E | title = Ancient human genome sequence of an extinct Palaeo-Eskimo | journal = Nature | date = 2010-02-11 | pmid=20148029 | pmc=3951495 | bibcode = 2010Natur.463..757R }}</ref> <ref name=wang2008>{{Cite journal | doi = 10.1038/nature07484 | issn = 0028-0836 | volume = 456 | issue = 7218 | pages = 60–65 | vauthors=Wang J, Wang W, Li R, Li Y, Tian G, Goodman L, Fan W, Zhang J, Li J, Zhang J, Guo Y, Feng B, Li H, Lu Y, Fang X, Liang H, Du Z, Li D, Zhao Y, Hu Y, Yang Z, Zheng H, Hellmann I, Inouye M, Pool J, Yi X, Zhao J, Duan J, Zhou Y, Qin J, Ma L, Li G, Yang Z, Zhang G, Yang B, Yu C, Liang F, Li W, Li S, Li D, Ni P, Ruan J, Li Q, Zhu H, Liu D, Lu Z, Li N, Guo G, Zhang J, Ye J, Fang L, Hao Q, Chen Q, Liang Y, Su Y, San A, Ping C, Yang S, Chen F, Li L, Zhou K, Zheng H, Ren Y, Yang L, Gao Y, Yang G, Li Z, Feng X, Kristiansen K, Wong GK, Nielsen R, Durbin R, Bolund L, Zhang X, Li S, Yang H, Wang J | title = The diploid genome sequence of an Asian individual | journal = Nature | date = 2008-11-06 | pmid=18987735 | pmc=2716080 | bibcode = 2008Natur.456...60W }} </ref> <!--unused<ref name=Stein2001>{{cite journal | last = Stein | first = L. | year = 2001 | title = Genome annotation: from sequence to biology | journal = [[Nature Reviews Genetics]] | volume = 2 | pages = 493–503 | doi = 10.1038/35080529 | pmid = 11433356 | issue = 7 }}</ref>--> <!--unused<ref name=ensembl>{{cite web | url=http://www.ensembl.org/info/genome/genebuild/index.html | title= Ensembl's genome annotation pipeline online documentation}}</ref> --> <!--unused<ref name="Gupta07">{{Cite journal | doi = 10.1101/gr.6427907 | issn = 1088-9051 | volume = 17 | issue = 9 | pages = 1362–1377 | last = Gupta | first = Nitin |author2=Stephen Tanner |author3=Navdeep Jaitly |author4=Joshua N Adkins |author5=Mary Lipton |author6=Robert Edwards |author7=Margaret Romine |author8=Andrei Osterman |author9=Vineet Bafna |author10=Richard D Smith |author11=Pavel A Pevzner | title = Whole proteome analysis of post-translational modifications: applications of mass-spectrometry for proteogenomic annotation | journal = Genome Research | date = September 2007 | pmid=17690205 | pmc=1950905 }}</ref>--> <!--unused<ref name=Huss2008>{{Cite journal | last = Huss | first = Jon W. | year = 2008 | title = A Gene Wiki for Community Annotation of Gene Function | journal = [[PLoS Biology]] | volume = 6 | pages = e175 | doi = 10.1371/journal.pbio.0060175 | pmid = 18613750 | last2 = Orozco | first2 = C | last3 = Goodale | first3 = J | last4 = Wu | first4 = C | last5 = Batalov | first5 = S | last6 = Vickers | first6 = TJ | last7 = Valafar | first7 = F | last8 = Su | first8 = AI | issue = 7 | pmc = 2443188 }}</ref>--> <ref name=cowpr>{{Cite web | last = Yates | first = Diana | title = What makes a cow a cow? Genome sequence sheds light on ruminant evolution | work = EurekAlert! | format = Press Release | access-date = 2012-12-22 | date = 2009-04-23 | url = http://www.eurekalert.org/pub_releases/2009-04/uoia-wma041709.php }}</ref> <ref name='cowGenome'>{{Cite journal | last1 = Elsik | first1 = C. G. | last2 = Elsik | first2 = R. L. | last3 = Tellam | first3 = K. C. | last4 = Worley | first4 = R. A. | last5 = Gibbs | first5 = D. M. | last6 = Muzny | first6 = G. M. | last7 = Weinstock | first7 = D. L. | last8 = Adelson | first8 = E. E. | last9 = Eichler | first9 = L. | last10 = Elnitski | doi = 10.1126/science.1169588 | first10 = R. | last11 = Guigó | first11 = D. L. | last12 = Hamernik | first12 = S. M. | last13 = Kappes | first13 = H. A. | last14 = Lewin | first14 = D. J. | last15 = Lynn | first15 = F. W. | last16 = Nicholas | first16 = A. | last17 = Reymond | first17 = M. | last18 = Rijnkels | first18 = L. C. | last19 = Skow | first19 = E. M. | last20 = Zdobnov | first20 = L. | last21 = Schook | first21 = J. | last22 = Womack | first22 = T. | last23 = Alioto | first23 = S. E. | last24 = Antonarakis | first24 = A. | last25 = Astashyn | first25 = C. E. | last26 = Chapple | first26 = H. -C. | last27 = Chen | first27 = J. | last28 = Chrast | first28 = F. | last29 = Câmara | first29 = O. | last30 = Ermolaeva | first30 = C. N. | title = The Genome Sequence of Taurine Cattle: A Window to Ruminant Biology and Evolution | journal = Science | volume = 324 | issue = 5926 | pages = 522–528 | year = 2009 | pmid = 19390049 | pmc =2943200 | bibcode = 2009Sci...324..522A | display-authors = 29 }}</ref> }} ==External links== {{wikibooks |1= Next Generation Sequencing (NGS) |2= De_novo_assembly }} {{Commons category|Genome projects}} *[http://www.genomesonline.org GOLD:Genomes OnLine Database] *[https://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=genomeprj Genome Project Database] *[https://archive.today/20121221003541/http://www.jcvi.org/pn-utility The Protein Naming Utility] *[http://supfam.org/SUPERFAMILY/ SUPERFAMILY] *[http://www.echinobase.org/Echinobase/ EchinoBase] {{Webarchive|url=https://web.archive.org/web/20161025050822/http://www.echinobase.org/Echinobase/ |date=2016-10-25 }} An Echinoderm genomic database, (previous SpBase, a sea urchin genome database) *[https://web.archive.org/web/20120928054058/http://www.nrcpb.org/content/nrcpb-scientists-succeeded-decoding-arhar-dal-genome NRCPB]. *[http://GIGA.Nova.edu Global Invertebrate Genomics Alliance (GIGA)] {{Webarchive|url=https://web.archive.org/web/20210121021301/http://giga.nova.edu/ |date=2021-01-21 }} * [http://www.sanger.ac.uk/about/who-we-are/sanger-institute Wellcome Sanger Institute] * [https://www.wellcomegenomecampus.org/ Wellcome Genome Campus] {{Genomics}} {{Portal bar|Biology|Technology|Medicine}} {{DEFAULTSORT:Genome Project}} [[Category:Genome projects| ]]
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)
Pages transcluded onto the current version of this page
(
help
)
:
Template:Cite book
(
edit
)
Template:Cite journal
(
edit
)
Template:Cite news
(
edit
)
Template:Cite web
(
edit
)
Template:Commons category
(
edit
)
Template:Genomics
(
edit
)
Template:Main
(
edit
)
Template:Navbox
(
edit
)
Template:Portal bar
(
edit
)
Template:Reflist
(
edit
)
Template:Short description
(
edit
)
Template:Sister project
(
edit
)
Template:Webarchive
(
edit
)
Template:Wikibooks
(
edit
)