Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Sequence assembly
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
=== De-novo vs mapping assembly === In terms of complexity and time requirements, de-novo assemblies are orders of magnitude slower and more memory intensive than mapping assemblies. This is mostly due to the fact that the assembly algorithm needs to compare every read with every other read (an operation that has a naive time complexity of O(<var>n</var><sup>2</sup>)). Current de-novo genome assemblers may use different types of graph-based algorithms, such as the:<ref>{{cite journal | vauthors = Li Z, Chen Y, Mu D, Yuan J, Shi Y, Zhang H, Gan J, Li N, Hu X, Liu B, Yang B, Fan W | title = Comparison of the two major classes of assembly algorithms: overlap-layout-consensus and de-bruijn-graph | journal = Briefings in Functional Genomics | volume = 11 | issue = 1 | pages = 25β37 | date = January 2012 | pmid = 22184334 | doi = 10.1093/bfgp/elr035 }}</ref> * ''Overlap/Layout/Consensus'' (OLC) approach, which was typical of the Sanger-data assemblers and relies on an overlap graph; * ''de Bruijn Graph'' (DBG) approach, which is most widely applied to the short reads from the Solexa and SOLiD platforms. It relies on K-mer graphs, which performs well with vast quantities of short reads; * ''Greedy graph-based'' approach, which may also use one of the OLC or DBG approaches. With greedy graph-based algorithms, the contigs, series of reads aligned together{{explain|date=March 2022}}, grow by greedy extension, always taking on the read that is found by following the highest-scoring overlap.<ref name="Wolf" /> Referring to the comparison drawn to shredded books in the introduction: while for mapping assemblies one would have a very similar book as a template (perhaps with the names of the main characters and a few locations changed), de-novo assemblies present a more daunting challenge in that one would not know beforehand whether this would become a science book, a novel, a catalogue, or even several books. Also, every shred would be compared with every other shred. Handling repeats in de-novo assembly requires the construction of a [[Graph theory|graph]] representing neighboring repeats. Such information can be derived from reading a long fragment covering the repeats in full or [[shotgun sequencing#Paired-end sequencing|only its two ends]]. On the other hand, in a mapping assembly, parts with multiple or no matches are usually left for another assembling technique to look into.<ref name="Wolf">{{cite web | vauthors = Wolf B |title=De novo genome assembly versus mapping to a reference genome |url=http://beat.wolf.home.hefr.ch/documents/prague.pdf |website=University of Applied Sciences Western Switzerland |access-date=6 April 2019}}</ref>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)