Editing Sequence assembly (section)

== Bioinformatics pipeline ==
In general, there are three steps in assembling sequencing reads into a scaffold:
# Pre-assembly: This step is essential to ensure the integrity of downstream analysis such as variant calling or final scaffold sequence. This step consists of two chronological workflows: 
## Quality check: Depending on the type of sequencing technology, different errors might arise that would lead to a false [[Base calling|base call]]. For example, sequencing "NAAAAAAAAAAAAN" and "NAAAAAAAAAAAN" which include 12 adenine might be wrongfully called with 11 adenine instead. Sequencing a highly repetitive segment of the target DNA/RNA might result in a call that is one base shorter or one base longer. Read quality is typically measured by [[Phred quality score|Phred]] which is an encoded score of each nucleotide quality within a read's sequence. Some sequencing technologies such as [[Pacbio|PacBio]] do not have a scoring method for their sequenced reads. A common tool used in this step is FastQC.<ref>{{Cite web |title=Babraham Bioinformatics - FastQC A Quality Control tool for High Throughput Sequence Data |url=https://www.bioinformatics.babraham.ac.uk/projects/fastqc/ |access-date=2022-05-09 |website=www.bioinformatics.babraham.ac.uk}}</ref>
## Filtering of reads: Reads that failed to pass the quality check should be removed from the [[FASTQ format|FASTQ]] file to get the best assembly contigs.
# Assembly:  During this step, reads alignment will be utilized with different criteria to map each read to the possible location. The predicted position of a read is based on either how much of its sequence aligns with other reads or a reference. Different alignment algorithms are used for reads from different sequencing technologies. Some of the commonly used approaches in the assembly are [[De Bruijn sequence|de Bruijn]] graph and overlapping. Read length, [[Coverage (genetics)|coverage]], quality, and the sequencing technique used plays a major role in choosing the best alignment algorithm in the case of [[DNA sequencing|Next Generation Sequencing]].<ref>{{cite journal | vauthors = Ruffalo M, LaFramboise T, Koyutürk M | title = Comparative analysis of algorithms for next-generation sequencing read alignment | journal = Bioinformatics | volume = 27 | issue = 20 | pages = 2790–2796 | date = October 2011 | pmid = 21856737 | doi = 10.1093/bioinformatics/btr477 | doi-access = free }}</ref> On the other hand, algorithms aligning 3rd generation sequencing reads requires advance approaches to account for the high error rate associated with them. 
# Post-assembly: This step is focusing on extracting valuable information from the assembled sequence. [[Comparative genomics]] and population analysis are examples of post-assembly analysis.