Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Sequence assembly
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
=== Genome === The first sequence assemblers began to appear in the late 1980s and early 1990s as variants of simpler [[sequence alignment]] programs to piece together vast quantities of fragments generated by automated sequencing instruments called [[DNA sequencers]].<ref name="Baker_2012">{{Cite journal | vauthors = Baker M |date=27 March 2012 |title=De novo genome assembly: what every biologist should know |url=https://www.nature.com/articles/nmeth.1935 |journal=Nature Methods |language=en |volume=9 |issue=4 |pages=333β337 |doi=10.1038/nmeth.1935 |issn=1548-7105}}</ref> As the sequenced organisms grew in size and complexity (from small [[viruses]] over [[plasmids]] to [[bacteria]] and finally [[eukaryotes]]), the assembly programs used in these [[genome project]]s needed increasingly sophisticated strategies to handle: * [[terabytes]] of sequencing data which need processing on [[Cluster computing|computing clusters]]; * identical and nearly identical sequences (known as ''repeats'') which can, in the worst case, increase the time and space complexity of algorithms quadratically; * [[DNA read errors]] in the fragments from the sequencing instruments, which can confound assembly. Faced with the challenge of assembling the first larger eukaryotic genomes—the fruit fly ''[[Drosophila melanogaster]]'' in 2000 and the human genome just a year later,—scientists developed assemblers like Celera Assembler<ref>{{cite journal | vauthors = Myers EW, Sutton GG, Delcher AL, Dew IM, Fasulo DP, Flanigan MJ, Kravitz SA, Mobarry CM, Reinert KH, Remington KA, Anson EL, Bolanos RA, Chou HH, Jordan CM, Halpern AL, Lonardi S, Beasley EM, Brandon RC, Chen L, Dunn PJ, Lai Z, Liang Y, Nusskern DR, Zhan M, Zhang Q, Zheng X, Rubin GM, Adams MD, Venter JC | title = A whole-genome assembly of Drosophila | journal = Science | volume = 287 | issue = 5461 | pages = 2196β2204 | date = March 2000 | pmid = 10731133 | doi = 10.1126/science.287.5461.2196 | s2cid = 6049420 | citeseerx = 10.1.1.79.9822 | bibcode = 2000Sci...287.2196M }}</ref> and Arachne<ref>{{cite journal | vauthors = Batzoglou S, Jaffe DB, Stanley K, Butler J, Gnerre S, Mauceli E, Berger B, Mesirov JP, Lander ES | title = ARACHNE: a whole-genome shotgun assembler | journal = Genome Research | volume = 12 | issue = 1 | pages = 177β189 | date = January 2002 | pmid = 11779843 | pmc = 155255 | doi = 10.1101/gr.208902 | author7-link = Bonnie Berger }}</ref> able to handle genomes of 130 million (e.g., the fruit fly ''D. melanogaster'') to 3 billion (e.g., the human genome) base pairs. Subsequent to these efforts, several other groups, mostly at the major genome sequencing centers, built large-scale assemblers, and an open source effort known as AMOS<ref>{{Cite web|title=AMOS WIKI|url=https://amos.sourceforge.net/wiki/index.php/AMOS|access-date=2023-01-02|website=amos.sourceforge.net}}</ref> was launched to bring together all the innovations in genome assembly technology under the [[Open-source software|open source]] framework. [[File:Seqassemble.png|thumb|Strategy how a sequence assembler would take fragments (shown below the black bar) and match overlaps among them to assembly the final sequence (in black). Potentially problematic repeats are shown above the sequence (in pink above). Without overlapping fragments it may be impossible to assign these segments to any specific region.|centre|450x450px]]
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)