Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Genomics
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
=== Assembly === {{Main|Sequence assembly}} {{multiple image | direction = vertical | align = right | width = 300 | image1 =PET contig scaffold.png | caption1 = Overlapping reads form contigs; contigs and gaps of known length form scaffolds. | image2 = Mapping Reads.png | caption2 = Paired end reads of next generation sequencing data mapped to a reference genome. | footer = Multiple, fragmented sequence reads must be assembled together on the basis of their overlapping areas. }} Sequence assembly refers to [[sequence alignment|aligning]] and merging fragments of a much longer [[DNA]] sequence in order to reconstruct the original sequence.<ref name = "Pevsner_2009"/> This is needed as current [[DNA sequencing]] technology cannot read whole genomes as a continuous sequence, but rather reads small pieces of between 20 and 1000 bases, depending on the technology used. Third generation sequencing technologies such as PacBio or Oxford Nanopore routinely generate sequencing reads 10-100 kb in length; however, they have a high error rate at approximately 1 percent.<ref name = "PacBio" /><ref name = "nanoporetech" /> Typically the short fragments, called reads, result from [[shotgun sequencing]] [[genome|genomic]] DNA, or [[Transcription (genetics)|gene transcripts]] ([[expressed sequence tag|ESTs]]).<ref name = "Pevsner_2009"/> ==== Assembly approaches ==== Assembly can be broadly categorized into two approaches: ''de novo'' assembly, for genomes which are not similar to any sequenced in the past, and comparative assembly, which uses the existing sequence of a closely related organism as a reference during assembly.<ref name = "Pop_2008"/> Relative to comparative assembly, ''de novo'' assembly is computationally difficult ([[NP-hard]]), making it less favourable for short-read NGS technologies. Within the ''de novo'' assembly paradigm there are two primary strategies for assembly, Eulerian path strategies, and overlap-layout-consensus (OLC) strategies. OLC strategies ultimately try to create a Hamiltonian path through an overlap graph which is an NP-hard problem. Eulerian path strategies are computationally more tractable because they try to find a Eulerian path through a deBruijn graph.<ref name = "Pop_2008"/> ==== Finishing ==== Finished genomes are defined as having a single contiguous sequence with no ambiguities representing each [[Replicon (genetics)|replicon]].<ref name = "Chain_2009"/>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)