Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
FASTA format
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
===Compression=== The compression of FASTA files requires a specific compressor to handle both channels of information: identifiers and sequence. For improved compression results, these are mainly divided into two streams where the compression is made assuming independence. For example, the algorithm MFCompress<ref name="MFCompress">{{cite journal | vauthors = Pinho AJ, Pratas D | title = MFCompress: a compression tool for FASTA and multi-FASTA data | journal = Bioinformatics | volume = 30 | issue = 1 | pages = 117–8 | date = January 2014 | pmid = 24132931 | pmc = 3866555 | doi = 10.1093/bioinformatics/btt594 }}</ref> performs lossless compression of these files using context modelling and arithmetic encoding. Genozip,<ref name="Genozip">{{Cite journal |last1=Lan |first1=Divon |last2=Tobler |first2=Ray |last3=Souilmi |first3=Yassine |last4=Llamas |first4=Bastien |date=2021-02-15 |title=Genozip: a universal extensible genomic data compressor |url=https://doi.org/10.1093/bioinformatics/btab102 |journal=Bioinformatics |volume=37 |issue=16 |pages=2225–2230 |doi=10.1093/bioinformatics/btab102 |issn=1367-4803 |pmc=8388020 |pmid=33585897}}</ref> a software package for compressing genomic files, uses an extensible context-based model. Benchmarks of FASTA file compression algorithms have been reported by Hosseini et al. in 2016,<ref name="Morteza">{{Cite journal |last1=Hosseini |first1=Morteza |last2=Pratas |first2=Diogo |last3=Pinho |first3=Armando J. |date=2016 |title=A Survey on Data Compression Methods for Biological Sequences |journal=Information |language=en |volume=7 |issue=4 |pages=56 |doi=10.3390/info7040056 |issn=2078-2489 |doi-access=free }}</ref> and Kryukov et al. in 2020.<ref name="SCB">{{cite journal | vauthors = Kryukov K, Ueda MT, Nakagawa S, Imanishi T | title = Sequence Compression Benchmark (SCB) database—A comprehensive evaluation of reference-free compressors for FASTA-formatted sequences | journal = GigaScience | volume = 9 | issue = 7 | pages = giaa072 | date = July 2020 | pmid = 32627830 | pmc = 7336184 | doi = 10.1093/gigascience/giaa072 }}</ref>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)