Editing Shotgun sequencing (section)

==Example==
For example, consider the following two rounds of shotgun reads:
{| class="wikitable"
|-
! Strand
! Sequence
|-
| Original
| <code><big>AGCATGCTGCAGTCATGCTTAGGCTA</big></code>
|-
| First shotgun sequence
| <code><big>AGCATGCTGCAGTCATGCT-------</big></code><br/><code><big>-------------------TAGGCTA</big></code>
|-
| Second shotgun sequence
| <code><big>AGCATG--------------------</big></code><br/><code><big>------CTGCAGTCATGCTTAGGCTA</big></code>
|-
| Reconstruction
| <code><big>AGCATGCTGCAGTCATGCTTAGGCTA</big></code>
|}

In this extremely simplified example, none of the reads cover the full length of the original sequence, but the four reads can be assembled into the original sequence using the overlap of their ends to align and order them. In reality, this process uses enormous amounts of information that are rife with ambiguities and sequencing errors. Assembly of complex genomes is additionally complicated by the great abundance of [[repeated sequence (DNA)|repetitive sequences]], meaning similar short reads could come from completely different parts of the sequence.

Many overlapping reads for each segment of the original DNA are necessary to overcome these difficulties and accurately assemble the sequence. For example, to complete the [[Human Genome Project]], most of the human genome was sequenced at 12X or greater ''coverage''; that is, each base in the final sequence was present on average in 12 different reads. Even so, current methods have failed to isolate or assemble reliable sequence for approximately 1% of the ([[euchromatin|euchromatic]]) human genome, as of 2004.<ref name=HGS2004>{{cite journal |author=International Human Genome Sequencing Consortium |title=Finishing the euchromatic sequence of the human genome |journal=Nature |date=21 October 2004 |volume=431 |issue=7011 |pages=931–945 |doi=10.1038/nature03001 |pmid=15496913 |bibcode=2004Natur.431..931H |doi-access=free}}</ref>