Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Sequence assembly
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Quality control == Most sequence assemblers have some algorithms built in for quality control, such as [[Phred (software)|Phred]].<ref>{{cite journal | vauthors = Cock PJ, Fields CJ, Goto N, Heuer ML, Rice PM | title = The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants | journal = Nucleic Acids Research | volume = 38 | issue = 6 | pages = 1767–1771 | date = April 2010 | pmid = 20015970 | pmc = 2847217 | doi = 10.1093/nar/gkp1137 }}</ref> However, such measures do not assess assembly completeness in terms of gene content. Some tools evaluate the quality of an assembly after the fact. For instance, BUSCO (Benchmarking Universal Single-Copy Orthologs) is a measure of gene completeness in a genome, gene set, or [[transcriptome]], using the fact that many genes are present only as single-copy genes in most genomes.<ref name="Simão_2015">{{cite journal | vauthors = Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM | title = BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs | journal = Bioinformatics | volume = 31 | issue = 19 | pages = 3210–3212 | date = October 2015 | pmid = 26059717 | doi = 10.1093/bioinformatics/btv351 }}</ref> The initial BUSCO sets represented 3023 genes for [[vertebrate]]s, 2675 for [[arthropod]]s, 843 for [[Animal|metazoans]], 1438 for [[Fungus|fungi]] and 429 for [[eukaryote]]s. This table shows an example for human and fruit fly genomes:<ref name="Simão_2015" /> {| class="wikitable" |+BUSCO notation assessment results (Complete, Duplicate, Fragmented, Missing in %) !Species !genes !Complete !Duplicated !Fragmented !Missing !n (BUSCO gene number) |- |''[[Human|Homo sapiens]]'' |20,364 |99 |1.7 |0.0 |0.0 |3,023 |- |''[[Drosophila melanogaster]]'' |13,918 |99 |3.7 |0.2 |0.0 |2,675 |}
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)