Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Burrows–Wheeler transform
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Optimization== A number of [[Optimization (computer science)|optimizations]] can make these algorithms run more efficiently without changing the output. There is no need to represent the table in either the encoder or decoder. In the encoder, each row of the table can be represented by a single pointer into the strings, and the sort performed using the indices. In the decoder, there is also no need to store the table, and the decoded string can be generated one character at a time from left to right. Comparative sorting can even be avoided in favor of linear sorting, with performance proportional to the alphabet size and string length. A "character" in the algorithm can be a byte, or a bit, or any other convenient size. One may also make the observation that mathematically, the encoded string can be computed as a simple modification of the [[suffix array]], and suffix arrays can be computed with linear time and memory. The BWT can be defined with regards to the suffix array SA of text T as (1-based indexing): <math display=block>BWT[i] = \begin{cases} T[SA[i]-1], & \text{if }SA[i] > 0\\ \$, & \text{otherwise}\end{cases}</math><ref>{{Cite journal|last1=Simpson|first1=Jared T.|last2=Durbin|first2=Richard|date=2010-06-15|title=Efficient construction of an assembly string graph using the FM-index|journal=Bioinformatics|language=en|volume=26|issue=12|pages=i367–i373|doi=10.1093/bioinformatics/btq217|issn=1367-4803|pmc=2881401|pmid=20529929}}</ref> There is no need to have an actual 'EOF' character. Instead, a pointer can be used that remembers where in a string the 'EOF' would be if it existed. In this approach, the output of the BWT must include both the transformed string, and the final value of the pointer. The inverse transform then shrinks it back down to the original size: it is given a string and a pointer, and returns just a string. A complete description of the algorithms can be found in Burrows and Wheeler's paper, or in a number of online sources.<ref name=Burrows1994/> The algorithms vary somewhat by whether EOF is used, and in which direction the sorting was done. In fact, the original formulation did not use an EOF marker.<ref name=Manzini>{{Cite book |chapter-url=https://people.unipmn.it/manzini/papers/mfcs99x.pdf |archive-url=https://ghostarchive.org/archive/20221009/https://people.unipmn.it/manzini/papers/mfcs99x.pdf |archive-date=2022-10-09 |url-status=live <!-- |url=https://books.google.com/books?id=OcJjpqAi15EC&pg=PA34e --> |title=Mathematical Foundations of Computer Science 1999: 24th International Symposium, MFCS'99 Szklarska Poreba, Poland, September 6-10, 1999 Proceedings |chapter=The Burrows–Wheeler Transform: Theory and Practice |last=Manzini |first=Giovanni |date=1999-08-18 |publisher=Springer Science & Business Media |isbn=9783540664086 |language=en}}</ref>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)