Editing DNA sequencing (section)

== Large-scale sequencing and ''de novo'' sequencing ==
[[File:DNA Sequencing gDNA libraries.jpg|thumb|right|Genomic DNA is fragmented into random pieces and cloned as a bacterial library. DNA from individual bacterial clones is sequenced and the sequence is assembled by using overlapping DNA regions.]]
Large-scale sequencing often aims at sequencing very long DNA pieces, such as whole [[chromosome]]s, although large-scale sequencing can also be used to generate very large numbers of short sequences, such as found in [[phage display]]. For longer targets such as chromosomes, common approaches consist of cutting (with [[restriction enzyme]]s) or shearing (with mechanical forces) large DNA fragments into shorter DNA fragments. The fragmented DNA may then be [[clone (genetics)|cloned]] into a [[Vector DNA|DNA vector]] and amplified in a bacterial host such as ''[[Escherichia coli]]''. Short DNA fragments purified from individual bacterial colonies are individually sequenced and [[sequence assembly|assembled electronically]] into one long, contiguous sequence. Studies have shown that adding a size selection step to collect DNA fragments of uniform size can improve sequencing efficiency and accuracy of the genome assembly. In these studies, automated sizing has proven to be more reproducible and precise than manual gel sizing.<ref name="pmid23147856">{{cite journal | vauthors = Quail MA, Gu Y, Swerdlow H, Mayho M | title = Evaluation and optimisation of preparative semi-automated electrophoresis systems for Illumina library preparation | journal = Electrophoresis | volume = 33 | issue = 23 | pages = 3521–28 | year = 2012 | pmid = 23147856 | doi = 10.1002/elps.201200128 | s2cid = 39818212 }}</ref><ref name="pmid22713159">{{cite journal | vauthors = Duhaime MB, Deng L, Poulos BT, Sullivan MB | title = Towards quantitative metagenomics of wild viruses and other ultra-low concentration DNA samples: a rigorous assessment and optimization of the linker amplification method | journal = Environ. Microbiol. | volume = 14 | issue = 9 | pages = 2526–37 | year = 2012 | pmid = 22713159 | pmc = 3466414 | doi = 10.1111/j.1462-2920.2012.02791.x | bibcode = 2012EnvMi..14.2526D }}</ref><ref name="pmid22675423">{{cite journal | vauthors = Peterson BK, Weber JN, Kay EH, Fisher HS, Hoekstra HE | title = Double digest RADseq: an inexpensive method for de novo SNP discovery and genotyping in model and non-model species | journal = PLOS ONE | volume = 7 | issue = 5 | pages = e37135 | year = 2012 | pmid = 22675423 | pmc = 3365034 | doi = 10.1371/journal.pone.0037135 |bibcode = 2012PLoSO...737135P | doi-access = free }}</ref>

The term "''de novo'' sequencing" specifically refers to methods used to determine the sequence of DNA with no previously known sequence. ''De novo'' translates from Latin as "from the beginning". Gaps in the assembled sequence may be filled by [[primer walking]]. The different strategies have different tradeoffs in speed and accuracy; [[shotgun sequencing|shotgun methods]] are often used for sequencing large genomes, but its assembly is complex and difficult, particularly with [[Microsatellite (genetics)|sequence repeat]]s often causing gaps in genome assembly.

Most sequencing approaches use an ''in vitro'' cloning step to amplify individual DNA molecules, because their molecular detection methods are not sensitive enough for single molecule sequencing. Emulsion PCR<ref name=Williams2006ePCR>{{cite journal | vauthors = Williams R, Peisajovich SG, Miller OJ, Magdassi S, Tawfik DS, Griffiths AD | title = Amplification of complex gene libraries by emulsion PCR | journal = Nature Methods | volume = 3 | issue = 7 | pages = 545–50 | year = 2006 | pmid = 16791213 | doi = 10.1038/nmeth896 | s2cid = 27459628 }}</ref> isolates individual DNA molecules along with primer-coated beads in aqueous droplets within an oil phase. A [[polymerase chain reaction]] (PCR) then coats each bead with clonal copies of the DNA molecule followed by immobilization for later sequencing. Emulsion PCR is used in the methods developed by Marguilis et al. (commercialized by [[454 Life Sciences]]), Shendure and Porreca et al. (also known as "[[Polony (biology)|polony sequencing]]") and [[ABI Solid Sequencing|SOLiD sequencing]], (developed by [[Agencourt]], later [[Applied Biosystems]], now [[Life Technologies (Thermo Fisher Scientific)|Life Technologies]]).<ref name="Margulies_2005"/><ref name=polony_sequencing>{{cite journal | vauthors = Shendure J, Porreca GJ, Reppas NB, Lin X, McCutcheon JP, Rosenbaum AM, Wang MD, Zhang K, Mitra RD, Church GM | title = Accurate Multiplex Polony Sequencing of an Evolved Bacterial Genome | journal = Science | volume = 309 | issue = 5741 | pages = 1728–32 | year = 2005 | pmid = 16081699 | doi = 10.1126/science.1117389 | bibcode = 2005Sci...309.1728S | s2cid = 11405973 | doi-access = free }}</ref><ref name=solid_sequencing>{{cite web|url=http://solid.appliedbiosystems.com/|archive-url=https://web.archive.org/web/20080516181322/http://solid.appliedbiosystems.com/|archive-date=2008-05-16|title=Applied Biosystems – File Not Found (404 Error)|date=16 May 2008}}</ref> Emulsion PCR is also used in the GemCode and Chromium platforms developed by [[10x Genomics]].<ref name="10x-epcr">{{cite journal | vauthors = Goodwin S, McPherson JD, McCombie WR | title = Coming of age: ten years of next-generation sequencing technologies | journal = Nature Reviews Genetics | volume = 17 | issue = 6 | pages = 333–51 | date = May 2016 | pmid = 27184599 | doi = 10.1038/nrg.2016.49 | pmc = 10373632 | s2cid = 8295541 }}</ref>

=== Shotgun sequencing ===
{{Main|Shotgun sequencing}}
Shotgun sequencing is a sequencing method designed for analysis of DNA sequences longer than 1000 base pairs, up to and including entire chromosomes.  This method requires the target DNA to be broken into random fragments.  After sequencing individual fragments using the [[Sanger sequencing#Method|chain termination method]], the sequences can be reassembled on the basis of their overlapping regions.<ref>{{cite journal | vauthors = Staden R | title = A strategy of DNA sequencing employing computer programs. | journal = Nucleic Acids Research | volume = 6 | issue = 7 | pages = 2601–10 | date = 11 June 1979 | pmid = 461197 | pmc = 327874 | doi = 10.1093/nar/6.7.2601 }}</ref>