Editing Sequence assembly (section)

=== De-novo vs mapping assembly ===

In terms of complexity and time requirements, de-novo assemblies are orders of magnitude slower and more memory intensive than mapping assemblies. This is mostly due to the fact that the assembly algorithm needs to compare every read with every other read (an operation that has a naive time complexity of O(<var>n</var><sup>2</sup>)). Current de-novo genome assemblers may use different types of graph-based algorithms, such as the:<ref>{{cite journal | vauthors = Li Z, Chen Y, Mu D, Yuan J, Shi Y, Zhang H, Gan J, Li N, Hu X, Liu B, Yang B, Fan W | title = Comparison of the two major classes of assembly algorithms: overlap-layout-consensus and de-bruijn-graph | journal = Briefings in Functional Genomics | volume = 11 | issue = 1 | pages = 25–37 | date = January 2012 | pmid = 22184334 | doi = 10.1093/bfgp/elr035 }}</ref>

* ''Overlap/Layout/Consensus'' (OLC) approach, which was typical of the Sanger-data assemblers and relies on an overlap graph;
* ''de Bruijn Graph'' (DBG) approach, which is most widely applied to the short reads from the Solexa and SOLiD platforms. It relies on K-mer graphs, which performs well with vast quantities of short reads;
* ''Greedy graph-based'' approach, which may also use one of the OLC or DBG approaches. With greedy graph-based algorithms, the contigs, series of reads aligned together{{explain|date=March 2022}}, grow by greedy extension, always taking on the read that is found by following the highest-scoring overlap.<ref name="Wolf" />

Referring to the comparison drawn to shredded books in the introduction: while for mapping assemblies one would have a very similar book as a template (perhaps with the names of the main characters and a few locations changed), de-novo assemblies present a more daunting challenge in that one would not know beforehand whether this would become a science book, a novel, a catalogue, or even several books. Also, every shred would be compared with every other shred.

Handling repeats in de-novo assembly requires the construction of a [[Graph theory|graph]] representing neighboring repeats. Such information can be derived from reading a long fragment covering the repeats in full or [[shotgun sequencing#Paired-end sequencing|only its two ends]]. On the other hand, in a mapping assembly, parts with multiple or no matches are usually left for another assembling technique to look into.<ref name="Wolf">{{cite web | vauthors = Wolf B |title=De novo genome assembly versus mapping to a reference genome |url=http://beat.wolf.home.hefr.ch/documents/prague.pdf |website=University of Applied Sciences Western Switzerland |access-date=6 April 2019}}</ref>