Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Sequence assembly
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Bioinformatics pipeline == In general, there are three steps in assembling sequencing reads into a scaffold: # Pre-assembly: This step is essential to ensure the integrity of downstream analysis such as variant calling or final scaffold sequence. This step consists of two chronological workflows: ## Quality check: Depending on the type of sequencing technology, different errors might arise that would lead to a false [[Base calling|base call]]. For example, sequencing "NAAAAAAAAAAAAN" and "NAAAAAAAAAAAN" which include 12 adenine might be wrongfully called with 11 adenine instead. Sequencing a highly repetitive segment of the target DNA/RNA might result in a call that is one base shorter or one base longer. Read quality is typically measured by [[Phred quality score|Phred]] which is an encoded score of each nucleotide quality within a read's sequence. Some sequencing technologies such as [[Pacbio|PacBio]] do not have a scoring method for their sequenced reads. A common tool used in this step is FastQC.<ref>{{Cite web |title=Babraham Bioinformatics - FastQC A Quality Control tool for High Throughput Sequence Data |url=https://www.bioinformatics.babraham.ac.uk/projects/fastqc/ |access-date=2022-05-09 |website=www.bioinformatics.babraham.ac.uk}}</ref> ## Filtering of reads: Reads that failed to pass the quality check should be removed from the [[FASTQ format|FASTQ]] file to get the best assembly contigs. # Assembly: During this step, reads alignment will be utilized with different criteria to map each read to the possible location. The predicted position of a read is based on either how much of its sequence aligns with other reads or a reference. Different alignment algorithms are used for reads from different sequencing technologies. Some of the commonly used approaches in the assembly are [[De Bruijn sequence|de Bruijn]] graph and overlapping. Read length, [[Coverage (genetics)|coverage]], quality, and the sequencing technique used plays a major role in choosing the best alignment algorithm in the case of [[DNA sequencing|Next Generation Sequencing]].<ref>{{cite journal | vauthors = Ruffalo M, LaFramboise T, Koyutürk M | title = Comparative analysis of algorithms for next-generation sequencing read alignment | journal = Bioinformatics | volume = 27 | issue = 20 | pages = 2790–2796 | date = October 2011 | pmid = 21856737 | doi = 10.1093/bioinformatics/btr477 | doi-access = free }}</ref> On the other hand, algorithms aligning 3rd generation sequencing reads requires advance approaches to account for the high error rate associated with them. # Post-assembly: This step is focusing on extracting valuable information from the assembled sequence. [[Comparative genomics]] and population analysis are examples of post-assembly analysis.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)