Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Burrows–Wheeler transform
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==BWT applications== As a [[lossless compression]] algorithm the Burrows–Wheeler transform offers the important quality that its encoding is reversible and hence the original data may be recovered from the resulting compression. The lossless quality of Burrows algorithm has provided for different algorithms with different purposes in mind. To name a few, Burrows–Wheeler transform is used in algorithms for [[sequence alignment]], [[image compression]], [[data compression]], etc. The following is a compilation of some uses given to the Burrows–Wheeler Transform. ===BWT for sequence alignment=== The advent of [[next-generation sequencing]] (NGS) techniques at the end of the 2000s decade has led to another application of the Burrows–Wheeler transformation. In NGS, [[DNA]] is fragmented into small pieces, of which the first few bases are [[DNA sequencing|sequenced]], yielding several millions of "reads", each 30 to 500 [[base pair]]s ("DNA characters") long. In many experiments, e.g., in [[ChIP-Seq]], the task is now to align these reads to a reference [[genome]], i.e., to the known, nearly complete sequence of the organism in question (which may be up to several billion base pairs long). A number of alignment programs, specialized for this task, were published, which initially relied on [[Hash function|hashing]] (e.g., [[Eland (software)|Eland]], SOAP,<ref name="Li, R2008">{{cite journal |author=Li R |title=SOAP: short oligonucleotide alignment program |journal=Bioinformatics |year=2008 |volume=24 |issue=5 |pages=713–714 |pmid=18227114 |doi=10.1093/bioinformatics/btn025|display-authors=etal|doi-access=free }}</ref> or Maq<ref name="Li, H2008">{{cite journal |vauthors=Li H, Ruan J, Durbin R |title=Mapping short DNA sequencing reads and calling variants using mapping quality scores |journal=Genome Research |volume=18 |issue=11 |pages=1851–1858 |date=2008-08-19 |pmid=18714091 |doi=10.1101/gr.078212.108 |pmc=2577856}}</ref>). In an effort to reduce the memory requirement for sequence alignment, several alignment programs were developed ([[Bowtie (sequence analysis)|Bowtie]],<ref name="Langmead2009">{{cite journal |vauthors=Langmead B, Trapnell C, Pop M, Salzberg SL |title=Ultrafast and memory-efficient alignment of short DNA sequences to the human genome |journal=Genome Biology |year=2009 |volume=10 |issue=3 |page=R25 |pmid=19261174 |doi=10.1186/gb-2009-10-3-r25 |pmc=2690996 |doi-access=free }}</ref> BWA,<ref name="Li, H2009">{{cite journal |vauthors=Li H, Durbin R |title=Fast and accurate short read alignment with Burrows–Wheeler Transform |journal=Bioinformatics |year=2009 |pmid=19451168 |volume=25 |issue=14 |pages=1754–1760 |doi=10.1093/bioinformatics/btp324 |pmc=2705234}}</ref> and SOAP2<ref name="Li, R2009">{{cite journal |author=Li R |title=SOAP2: an improved ultrafast tool for short read alignment |journal=Bioinformatics |year=2009 |pmid=19497933 |volume=25 |issue=15 |pages=1966–1967 |doi=10.1093/bioinformatics/btp336 |display-authors=etal|doi-access= }}</ref>) that use the Burrows–Wheeler transform. ===BWT for image compression=== The Burrows–Wheeler transformation has proved to be fundamental for [[image compression]] applications. For example,<ref name="Collin, P2019">{{cite book |vauthors=Collin P, Arnavut Z, Koc B |chapter=Lossless compression of medical images using Burrows–Wheeler Transformation with Inversion Coder |chapter-url=https://ieeexplore.ieee.org/document/7319012| title=2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)|journal=<!-- --> |date=2015 |volume=2015 |pages=2956–2959 |pmid=26736912 |doi=10.1109/EMBC.2015.7319012 |isbn=978-1-4244-9271-8 |s2cid=4460328 }}</ref> Showed a compression pipeline based on the application of the Burrows–Wheeler transformation followed by inversion, run-length, and arithmetic encoders. The pipeline developed in this case is known as Burrows–Wheeler transform with an inversion encoder (BWIC). The results shown by BWIC are shown to outperform the compression performance of well-known and widely used algorithms like [[Lossless JPEG]] and [[JPEG 2000]]. BWIC is shown to outperform those in terms of final compression size of radiography medical images on the order of 5.1% and 4.1% respectively. The improvements are achieved by combining BWIC and a pre-BWIC scan of the image in a vertical snake order fashion. More recently, additional works have shown the implementation of the Burrows–Wheeler Transform in conjunction with the known [[move-to-front transform]] (MTF) achieve near lossless compression of images. <ref name="Devadoss, CP2019">{{cite journal |vauthors=Devadoss CP, Sankaragomathi B |title=Near lossless medical image compression using block BWT–MTF and hybrid fractal compression techniques |url=https://link.springer.com/article/10.1007/s10586-018-1801-3| journal=Cluster Computing| date=2019| volume=22 |pages=12929–12937 |doi=10.1007/s10586-018-1801-3|s2cid=33687086 }}</ref> ===BWT for compression of genomic databases=== Cox et al.<ref name="Cox, AJ2012">{{cite journal |vauthors= Cox AJ, Bauer MJ, Jakobi T, Rosone G|title=Large-scale compression of genomic sequence databases with the Burrows–Wheeler transform |journal=Bioinformatics |volume=28 |number=11 |pages=1415–1419 |year=2012 | publisher=Oxford University Press |doi=10.1093/bioinformatics/bts173 |pmid=22556365 |arxiv=1205.0192 }}</ref> presented a genomic compression scheme that uses BWT as the algorithm applied during the first stage of compression of several genomic datasets including the human genomic information. Their work proposed that BWT compression could be enhanced by including a second stage compression mechanism called same-as-previous encoding ("SAP"), which makes use of the fact that suffixes of two or more prefix letters could be equal. With the compression mechanism BWT-SAP, Cox et al. showed that in the genomic database ERA015743, 135.5 GB in size, the compression scheme BWT-SAP compresses the ERA015743 dataset by around 94%, to 8.2 GB. ===BWT for sequence prediction=== BWT has also been proved to be useful on sequence prediction which is a common area of study in [[machine learning]] and [[natural-language processing]]. In particular, Ktistakis et al.<ref name="Ktistakis, R2019">{{cite book |vauthors= Ktistakis R, Fournier-Viger P, Puglisi SJ, Raman R|chapter=Succinct BWT-Based Sequence Prediction |chapter-url=https://figshare.com/articles/conference_contribution/Succinct_BWT-based_Sequence_prediction/10200137| title= Database and Expert Systems Applications |series=Lecture Notes in Computer Science |date= 2019 |volume= 11707 |issue=10 |pages= 91–101 | doi=10.1007/978-3-030-27618-8_7|isbn=978-3-030-27617-1 |s2cid=201058996 }}</ref> proposed a sequence prediction scheme called SuBSeq that exploits the lossless compression of data of the Burrows–Wheeler transform. SuBSeq exploits BWT by extracting the [[FM-index]] and then performing a series of operations called backwardSearch, forwardSearch, neighbourExpansion, and getConsequents in order to search for predictions given a [[suffix]]. The predictions are then classified based on a weight and put into an array from which the element with the highest weight is given as the prediction from the SuBSeq algorithm. SuBSeq has been shown to outperform [[state of the art]] algorithms for sequence prediction both in terms of training time and accuracy.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)