Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Burrows–Wheeler transform
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Description== The transform is done by constructing a matrix (known as the Burrows-Wheeler Matrix<ref name="langmead">{{cite web |last1=Langmead |first1=Ben |title=Burrows-Wheeler Transform and FM Index |url=https://www.cs.jhu.edu/~langmea/resources/lecture_notes/bwt_and_fm_index.pdf |website=Johns Hopkins Whiting School of Engineering |access-date=23 April 2025}}</ref>) whose rows are the [[circular shift]]s of the input text, [[sorted]] in [[lexicographic order]], then taking the final column of that matrix. To allow the transform to be reversed, one additional step is necessary: either the index of the original string in the Burrows-Wheeler Matrix must be returned along with the transformed string (the approach shown in the original paper by Burrows and Wheeler<ref name=Burrows1994/>) or a special end-of-text character must be added at the start or end of the input text before the transform is executed.<ref name="langmead"/> ===Example=== Given an input string <code>S = <span style="color:red;">^</span>BANANA<span style="color:red;">$</span></code> (step 1 in the table below), rotate it ''N'' times (step 2), where <code>N = 8</code> is the length of the <code>S</code> string considering also the red <code><span style="color:red;">^</span></code> character representing the start of the string and the red <code><span style="color:red;">$</span></code> character representing the '[[End-of-file|EOF]]' pointer; these rotations, or circular shifts, are then sorted lexicographically (step 3). The output of the encoding phase is the last column <code>L = BNN<span style="color:red;">^</span>AA<span style="color:red;">$</span>A</code> after step 3, and the index (0-based) <code>I</code> of the row containing the original string <code>S</code>, in this case <code>I = 6</code>. It is not necessary to use both <code><span style="color:red;">$</span></code> and <code><span style="color:red;">^</span></code>, but at least one must be used, else we cannot invert the transform, since all circular permutations of a string have the same Burrows–Wheeler transform. {| class="wikitable" |- ! colspan="5" | Transformation |- ! 1. Input ! 2. All<br />rotations ! 3. Sort into<br />lexical order ! 4. Take the<br />last column ! 5. Output |- | align=center | <span style="color:red;">^</span>BANANA<span style="color:red;">$</span> | <span style="color:red;">^</span>BANANA<span style="color:red;">$</span> <span style="color:red;">$</span><span style="color:red;">^</span>BANANA A<span style="color:red;">$</span><span style="color:red;">^</span>BANAN NA<span style="color:red;">$</span><span style="color:red;">^</span>BANA ANA<span style="color:red;">$</span><span style="color:red;">^</span>BAN NANA<span style="color:red;">$</span><span style="color:red;">^</span>BA ANANA<span style="color:red;">$</span><span style="color:red;">^</span>B BANANA<span style="color:red;">$</span><span style="color:red;">^</span> | '''A'''NANA<span style="color:red;">$</span><span style="color:red;">^</span>B '''A'''NA<span style="color:red;">$</span><span style="color:red;">^</span>BAN '''A'''<span style="color:red;">$</span><span style="color:red;">^</span>BANAN '''B'''ANANA<span style="color:red;">$</span><span style="color:red;">^</span> '''N'''ANA<span style="color:red;">$</span><span style="color:red;">^</span>BA '''N'''A<span style="color:red;">$</span><span style="color:red;">^</span>BANA <span style="color:red;">^</span>BANANA<span style="color:red;">$</span> <span style="color:red;">'''$^'''</span>BANANA | ANANA<span style="color:red;">$</span><span style="color:red;">^</span>'''B''' ANA<span style="color:red;">$</span><span style="color:red;">^</span>BA'''N''' A<span style="color:red;">$</span><span style="color:red;">^</span>BANA'''N''' BANANA<span style="color:red;">$</span><span style="color:red;">'''^'''</span> NANA<span style="color:red;">$</span><span style="color:red;">^</span>B'''A''' NA<span style="color:red;">$</span><span style="color:red;">^</span>BAN'''A''' <span style="color:red;">^</span>BANANA<span style="color:red;">'''$'''</span> <span style="color:red;">$^</span>BANAN'''A''' | BNN<span style="color:red;">^</span>AA<span style="color:red;">$</span>A |} ===Pseudocode=== The following [[pseudocode]] gives a simple (though inefficient) way to calculate the BWT and its inverse. It assumes that the input string <code>s</code> contains a special character 'EOF' which is the last character and occurs nowhere else in the text. '''function''' BWT (''string'' s) create a table, where the rows are all possible rotations of s sort rows alphabetically '''return''' (last column of the table) '''function''' inverseBWT (''string'' s) create empty table '''repeat''' length(s) '''times''' // first insert creates first column insert s as a column of table before first column of the table sort rows of the table alphabetically '''return''' (row that ends with the 'EOF' character)
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)