Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
DNA sequencing
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Computational challenges == The sequencing technologies described here produce raw data that needs to be assembled into longer sequences such as complete genomes ([[sequence assembly]]). There are many computational challenges to achieve this, such as the evaluation of the raw sequence data which is done by programs and algorithms such as [[Phred (software)|Phred]] and [[Phrap]]. Other challenges have to deal with [[Repetitive DNA|repetitive]] sequences that often prevent complete genome assemblies because they occur in many places of the genome. As a consequence, many sequences may not be assigned to particular [[chromosome]]s. The production of raw sequence data is only the beginning of its detailed [[Bioinformatics|bioinformatical]] analysis.<ref name="pmid24727769">{{cite journal | vauthors = Severin J, Lizio M, Harshbarger J, Kawaji H, Daub CO, Hayashizaki Y, Bertin N, Forrest AR | title = Interactive visualization and analysis of large-scale sequencing datasets using ZENBU | journal = Nat. Biotechnol. | volume = 32 | issue = 3 | pages = 217β19 | year = 2014 | pmid = 24727769 | doi = 10.1038/nbt.2840 | s2cid = 26575621 }}</ref> Yet new methods for sequencing and correcting sequencing errors were developed.<ref>{{cite journal | vauthors = Shmilovici A, Ben-Gal I | title = Using a VOM model for reconstructing potential coding regions in EST sequences | journal = Computational Statistics | year = 2007 | volume = 22 | issue = 1 | pages = 49β69 | doi = 10.1007/s00180-007-0021-8 | s2cid = 2737235 | url = http://www.eng.tau.ac.il/~bengal/VOM_EST.pdf | access-date = 10 January 2014 | archive-date = 31 May 2020 | archive-url = https://web.archive.org/web/20200531014400/http://www.eng.tau.ac.il/~bengal/VOM_EST.pdf | url-status = dead }}</ref> === Read trimming === Sometimes, the raw reads produced by the sequencer are correct and precise only in a fraction of their length. Using the entire read may introduce artifacts in the downstream analyses like genome assembly, SNP calling, or gene expression estimation. Two classes of trimming programs have been introduced, based on the window-based or the running-sum classes of algorithms.<ref>{{cite journal | vauthors = Del Fabbro C, Scalabrin S, Morgante M, Giorgi FM | title = An Extensive Evaluation of Read Trimming Effects on Illumina NGS Data Analysis | journal = PLOS ONE | volume = 8 | issue = 12 | pages = e85024 | year = 2013 | pmid = 24376861 | pmc = 3871669 | doi = 10.1371/journal.pone.0085024 | bibcode = 2013PLoSO...885024D | doi-access = free }}</ref> This is a partial list of the trimming algorithms currently available, specifying the algorithm class they belong to: {| class="wikitable" |+ Read Trimming Algorithms ! Name of algorithm !! Type of algorithm |- | Cutadapt<ref name=cutadapt>{{cite journal|last1=Martin|first1=Marcel|title=Cutadapt removes adapter sequences from high-throughput sequencing reads|journal=EMBnet.journal|date=2 May 2011|volume=17|issue=1|page=10|doi=10.14806/ej.17.1.200|doi-access=free}}</ref> || Running sum |- | ConDeTri<ref name=condetri>{{cite journal | vauthors = Smeds L, KΓΌnstner A | title = ConDeTri--a content dependent read trimmer for Illumina data | journal = PLOS ONE | volume = 6 | issue = 10 | pages = e26314 | date = 19 October 2011 | pmid = 22039460 | pmc = 3198461 | doi = 10.1371/journal.pone.0026314 | bibcode = 2011PLoSO...626314S | doi-access = free }}</ref> || Window based |- | ERNE-FILTER<ref name="erne-bs5">{{cite book| vauthors = Prezza N, Del Fabbro C, Vezzi F, De Paoli E, Policriti A |title=Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine |chapter=Erne-Bs5 |date=2012|volume=12|pages=12β19|doi=10.1145/2382936.2382938|isbn=9781450316705|s2cid=5673753}}</ref> || Running sum |- | FASTX quality trimmer || Window based |- | PRINSEQ<ref name=prinseq>{{cite journal | vauthors = Schmieder R, Edwards R | title = Quality control and preprocessing of metagenomic datasets | journal = Bioinformatics | volume = 27 | issue = 6 | pages = 863β4 | date = March 2011 | pmid = 21278185 | pmc = 3051327 | doi = 10.1093/bioinformatics/btr026 }}</ref> || Window based |- | Trimmomatic<ref name=trimmomatic>{{cite journal | vauthors = Bolger AM, Lohse M, Usadel B | title = Trimmomatic: a flexible trimmer for Illumina sequence data | journal = Bioinformatics | volume = 30 | issue = 15 | pages = 2114β20 | date = August 2014 | pmid = 24695404 | pmc = 4103590 | doi = 10.1093/bioinformatics/btu170 }}</ref> || Window based |- | SolexaQA<ref name=solexaqa>{{cite journal | vauthors = Cox MP, Peterson DA, Biggs PJ | title = SolexaQA: At-a-glance quality assessment of Illumina second-generation sequencing data | journal = BMC Bioinformatics | volume = 11 | issue = 1 | pages = 485 | date = September 2010 | pmid = 20875133 | pmc = 2956736 | doi = 10.1186/1471-2105-11-485 | doi-access = free }}</ref> || Window based |- | SolexaQA-BWA || Running sum |- | Sickle || Window based |}
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)