Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Genome project
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Genome assembly== <!-- This section is linked from [[Genetics]] --> {{main|Sequence assembly}} Genome assembly refers to the process of taking a large number of short [[DNA sequence]]s and reassembling them to create a representation of the original [[chromosome]]s from which the DNA originated. In a [[shotgun sequencing]] project, all the DNA from a source (usually a single [[organism]], anything from a [[bacterium]] to a [[mammal]]) is first fractured into millions of small pieces. These pieces are then "read" by automated sequencing machines. A genome assembly [[algorithm]] works by taking all the pieces and aligning them to one another, and detecting all places where two of the short sequences, or ''reads'', overlap. These overlapping reads can be merged, and the process continues. Genome assembly is a very difficult [[computational biology|computational]] problem, made more difficult because many genomes contain large numbers of identical sequences, known as [[Repeated sequence (DNA)|repeats]]. These repeats can be thousands of nucleotides long, and occur different locations, especially in the large genomes of [[plant]]s and [[animal]]s. The resulting (draft) genome sequence is produced by combining the information sequenced [[contig]]s and then employing linking information to create scaffolds. Scaffolds are positioned along the [[gene mapping#Physical Mapping|physical map]] of the chromosomes creating a "golden path". ===Assembly software=== Originally, most large-scale DNA sequencing centers developed their own software for assembling the sequences that they produced. However, this has changed as the software has grown more complex and as the number of sequencing centers has increased. An example of such [[Sequence assembly#Programs|assembler]] ''Short Oligonucleotide Analysis Package'' developed by [[Beijing Genomics Institute|BGI]] for de novo assembly of human-sized genomes, alignment, [[Single-nucleotide polymorphism|SNP]] detection, resequencing, indel finding, and structural variation analysis.<ref name="li2010"/><ref name="ReferenceA"/><ref name="wang2008"/>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)