Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Lossless compression
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
===Genetics and genomics === [[Compression of genomic sequencing data|Genetics compression algorithms]] (not to be confused with [[genetic algorithm]]s) are the latest generation of lossless algorithms that compress data (typically sequences of nucleotides) using both conventional compression algorithms and specific algorithms adapted to genetic data. In 2012, a team of scientists from Johns Hopkins University published the first genetic compression algorithm that does not rely on external genetic databases for compression. HAPZIPPER was tailored for [[International_HapMap_Project|HapMap]] data and achieves over 20-fold compression (95% reduction in file size), providing 2- to 4-fold better compression much faster than leading general-purpose compression utilities.<ref>{{cite journal |author=Chanda, P. |author2=Elhaik, E. |author3=Bader, J.S. | title=HapZipper: sharing HapMap populations just got easier | journal=Nucleic Acids Res | pages=1β7 | year=2012 | pmid=22844100 | doi=10.1093/nar/gks709 | volume=40 | issue=20 | pmc=3488212}}</ref> Genomic sequence compression algorithms, also known as DNA sequence compressors, explore the fact that DNA sequences have characteristic properties, such as inverted repeats. The most successful compressors are XM and GeCo.<ref name=Pratas>{{cite book |last1=Pratas |first1=D. |last2=Pinho |first2=A. J. |last3=Ferreira |first3=P. J. S. G. |date=2016 |chapter=Efficient compression of genomic sequences |title=Data Compression Conference |location=Snowbird, Utah |url=http://sweet.ua.pt/pratas/papers/Pratas-2016b.pdf}}</ref> For [[eukaryotes]] XM is slightly better in compression ratio, though for sequences larger than 100 MB its computational requirements are impractical.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)