Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Burrows–Wheeler transform
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Explanation== If the original string had several substrings that occurred often, then the BWT-transformed string will have several places where a single character is repeated many times in a row,<ref>{{cite web|url=https://github.com/adrien-mogenet/scala-bwt/blob/master/src/main/scala/me/algos/bwt/BurrowsWheelerCodec.scala|title=adrien-mogenet/scala-bwt|website=GitHub|access-date=19 April 2018}}</ref> creating more-easily-compressible data. For instance, consider transforming an English text frequently containing the word "the": For example: {| class="wikitable" ! Input | <code>THE.MAN.AND.THE.DOG.WAITED.AT.THE.STATION.FOR.THE.TRAIN.TO.THE.CITY</code> |- ! Output | <code>NDEENEEODTRNEGRWM..T.EN.HHHHHT.OTTTTTATAC.AOIATDIFOT.ASI..Y..A..I.T</code> |} Sorting the rotations of this text groups rotations starting with "he " together, and the last character of such a rotation (which is also the character before the "he ") will usually be "t" (though perhaps occasionally not, such as if the text contained "ache "), so the result of the transform will contain a run, or runs, of many consecutive "t" characters. Similarly, rotations beginning with "e " are grouped together, but "e " is often preceded by "h", so we see the output above contains a run of five consecutive "h" characters. Thus it can be seen that the success of this transform depends upon one value having a high probability of occurring before a sequence, so that in general it needs fairly long samples (a few kilobytes at least) of appropriate data (such as text). The remarkable thing about the BWT is not that it generates a more easily encoded output—an ordinary sort would do that—but that it does this ''reversibly'', allowing the original document to be re-generated from the last column data. The inverse can be understood this way. Take the final table in the BWT algorithm, and erase all but the last column. Given only this information, you can easily reconstruct the first column. The last column tells you all the characters in the text, so just sort these characters alphabetically to get the first column. Then, the last and first columns (of each row) together give you all ''pairs'' of successive characters in the document, where pairs are taken cyclically so that the last and first character form a pair. Sorting the list of pairs gives the first ''and second'' columns. To obtain the third column, the last column is again prepended to the table, and the rows are sorted lexicographically. Continuing in this manner, you can reconstruct the entire list. Then, the row with the "end of file" character at the end is the original text. Reversing the example above is done like this: {| class="wikitable" ! colspan=4 | Inverse transformation |- ! colspan=4 | Input |- | align=center colspan=4 | BNN<span style="color:red;">^</span>AA<span style="color:red;">$</span>A |- ! Add 1 ! Sort 1 ! Add 2 ! Sort 2 |- | align=right | B N N <span style="color:red;">^</span> A A <span style="color:red;">$</span> A | align=right | A A A B N N <span style="color:red;">^</span> <span style="color:red;">$</span> | align=right | BA NA NA <span style="color:red;">^</span>B AN AN <span style="color:red;">$</span><span style="color:red;">^</span> A<span style="color:red;">$</span> | align=right | AN AN A<span style="color:red;">$</span> BA NA NA <span style="color:red;">^</span>B <span style="color:red;">$</span><span style="color:red;">^</span> |- ! Add 3 ! Sort 3 ! Add 4 ! Sort 4 |- | align=right | BAN NAN NA<span style="color:red;">$</span> <span style="color:red;">^</span>BA ANA ANA <span style="color:red;">$</span><span style="color:red;">^</span>B A<span style="color:red;">$</span><span style="color:red;">^</span> | align=right | ANA ANA A<span style="color:red;">$</span><span style="color:red;">^</span> BAN NAN NA<span style="color:red;">$</span> <span style="color:red;">^</span>BA <span style="color:red;">$</span><span style="color:red;">^</span>B | align=right | BANA NANA NA<span style="color:red;">$</span><span style="color:red;">^</span> <span style="color:red;">^</span>BAN ANAN ANA<span style="color:red;">$</span> <span style="color:red;">$</span><span style="color:red;">^</span>BA A<span style="color:red;">$</span><span style="color:red;">^</span>B | align=right | ANAN ANA<span style="color:red;">$</span> A<span style="color:red;">$</span><span style="color:red;">^</span>B BANA NANA NA<span style="color:red;">$</span><span style="color:red;">^</span> <span style="color:red;">^</span>BAN <span style="color:red;">$</span><span style="color:red;">^</span>BA |- ! Add 5 ! Sort 5 ! Add 6 ! Sort 6 |- | align=right | BANAN NANA<span style="color:red;">$</span> NA<span style="color:red;">$</span><span style="color:red;">^</span>B <span style="color:red;">^</span>BANA ANANA ANA<span style="color:red;">$</span><span style="color:red;">^</span> <span style="color:red;">$</span><span style="color:red;">^</span>BAN A<span style="color:red;">$</span><span style="color:red;">^</span>BA | align=right | ANANA ANA<span style="color:red;">$</span><span style="color:red;">^</span> A<span style="color:red;">$</span><span style="color:red;">^</span>BA BANAN NANA<span style="color:red;">$</span> NA<span style="color:red;">$</span><span style="color:red;">^</span>B <span style="color:red;">^</span>BANA <span style="color:red;">$</span><span style="color:red;">^</span>BAN | align=right | BANANA NANA<span style="color:red;">$</span><span style="color:red;">^</span> NA<span style="color:red;">$</span><span style="color:red;">^</span>BA <span style="color:red;">^</span>BANAN ANANA<span style="color:red;">$</span> ANA<span style="color:red;">$</span><span style="color:red;">^</span>B <span style="color:red;">$</span><span style="color:red;">^</span>BANA A<span style="color:red;">$</span><span style="color:red;">^</span>BAN | align=right | ANANA<span style="color:red;">$</span> ANA<span style="color:red;">$</span><span style="color:red;">^</span>B A<span style="color:red;">$</span><span style="color:red;">^</span>BAN BANANA NANA<span style="color:red;">$</span><span style="color:red;">^</span> NA<span style="color:red;">$</span><span style="color:red;">^</span>BA <span style="color:red;">^</span>BANAN <span style="color:red;">$</span><span style="color:red;">^</span>BANA |- ! Add 7 ! Sort 7 ! Add 8 ! Sort 8 |- | align=right | BANANA<span style="color:red;">$</span> NANA<span style="color:red;">$</span><span style="color:red;">^</span>B NA<span style="color:red;">$</span><span style="color:red;">^</span>BAN <span style="color:red;">^</span>BANANA ANANA<span style="color:red;">$</span><span style="color:red;">^</span> ANA<span style="color:red;">$</span><span style="color:red;">^</span>BA <span style="color:red;">$</span><span style="color:red;">^</span>BANAN A<span style="color:red;">$</span><span style="color:red;">^</span>BANA | align=right | ANANA<span style="color:red;">$</span><span style="color:red;">^</span> ANA<span style="color:red;">$</span><span style="color:red;">^</span>BA A<span style="color:red;">$</span><span style="color:red;">^</span>BANA BANANA<span style="color:red;">$</span> NANA<span style="color:red;">$</span><span style="color:red;">^</span>B NA<span style="color:red;">$</span><span style="color:red;">^</span>BAN <span style="color:red;">^</span>BANANA <span style="color:red;">$</span><span style="color:red;">^</span>BANAN | align=right | BANANA<span style="color:red;">$</span><span style="color:red;">^</span> NANA<span style="color:red;">$</span><span style="color:red;">^</span>BA NA<span style="color:red;">$</span><span style="color:red;">^</span>BANA <span style="color:red;">^</span>BANANA<span style="color:red;">$</span> ANANA<span style="color:red;">$</span><span style="color:red;">^</span>B ANA<span style="color:red;">$</span><span style="color:red;">^</span>BAN <span style="color:red;">$</span><span style="color:red;">^</span>BANANA A<span style="color:red;">$</span><span style="color:red;">^</span>BANAN | align=right | ANANA<span style="color:red;">$</span><span style="color:red;">^</span>B ANA<span style="color:red;">$</span><span style="color:red;">^</span>BAN A<span style="color:red;">$</span><span style="color:red;">^</span>BANAN BANANA<span style="color:red;">$</span><span style="color:red;">^</span> NANA<span style="color:red;">$</span><span style="color:red;">^</span>BA NA<span style="color:red;">$</span><span style="color:red;">^</span>BANA <span style="color:red;">^</span>BANANA<span style="color:red;">$</span> <span style="color:red;">$</span><span style="color:red;">^</span>BANANA |- ! colspan=4 | Output |- | align=center colspan=4 | <span style="color:red;">^</span>BANANA<span style="color:red;">$</span> |}
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)