Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Burrows–Wheeler transform
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Bijective variant== Since any rotation of the input string will lead to the same transformed string, the BWT cannot be inverted without adding an EOF marker to the end of the input or doing something equivalent, making it possible to distinguish the input string from all its rotations. Increasing the size of the alphabet (by appending the EOF character) makes later compression steps awkward. There is a [[bijective]] version of the transform, by which the transformed string uniquely identifies the original, and the two have the same length and contain exactly the same characters, just in a different order.<ref>{{citation | last1 = Gil | first1 = J. | last2 = Scott | first2 = D. A. | title = A bijective string sorting transform | url = http://bijective.dogma.net/00yyy.pdf | year = 2009 | access-date = 2009-07-09 | archive-date = 2011-10-08 | archive-url = https://web.archive.org/web/20111008001603/http://bijective.dogma.net/00yyy.pdf | url-status = dead }}</ref><ref>{{citation | last = Kufleitner | first = Manfred | editor1-last = Holub | editor1-first = Jan | editor2-last = Žďárek | editor2-first = Jan | arxiv = 0908.0239 | contribution = On bijective variants of the Burrows–Wheeler transform | pages = 65–69 | title = Prague Stringology Conference | url = http://www.stringology.org/event/2009/p07.html | year = 2009| bibcode = 2009arXiv0908.0239K}}.</ref> The bijective transform is computed by factoring the input into a non-increasing sequence of [[Lyndon word]]s; such a factorization exists and is unique by the [[Chen–Fox–Lyndon theorem]],<ref>*{{Citation | last=Lothaire | first=M. | author-link=M. Lothaire | others=Perrin, D.; Reutenauer, C.; Berstel, J.; Pin, J. E.; Pirillo, G.; Foata, D.; Sakarovitch, J.; Simon, I.; Schützenberger, M. P.; Choffrut, C.; Cori, R.; Lyndon, Roger; Rota, Gian-Carlo. Foreword by Roger Lyndon | title=Combinatorics on words | edition=2nd | series=Encyclopedia of Mathematics and Its Applications | volume=17 | publisher=[[Cambridge University Press]] | year=1997 | isbn=978-0-521-59924-5 | zbl=0874.20040 | page=67 }}</ref> and may be found in linear time and constant space.<ref>{{citation | last = Duval | first = Jean-Pierre | doi = 10.1016/0196-6774(83)90017-2 | issue = 4 | journal = Journal of Algorithms | pages = 363–381 | title = Factorizing words over an ordered alphabet | volume = 4 | year = 1983 | zbl=0532.68061| issn=0196-6774}}.</ref> The algorithm sorts the rotations of all the words; as in the Burrows–Wheeler transform, this produces a sorted sequence of ''n'' strings. The transformed string is then obtained by picking the final character of each string in this sorted list. The one important caveat here is that strings of different lengths are not ordered in the usual way; the two strings are repeated forever, and the infinite repeats are sorted. For example, "ORO" precedes "OR" because "OROORO..." precedes "OROROR...". For example, the text "<span style="color:red;">^</span>BANANA<span style="color:red;">$</span>" is transformed into "ANNBAA<span style="color:red;">^</span><span style="color:red;">$</span>" through these steps (the red <span style="color:red;">$</span> character indicates the [[End-of-file|EOF]] pointer) in the original string. The EOF character is unneeded in the bijective transform, so it is dropped during the transform and re-added to its proper place in the file. The string is broken into Lyndon words so the words in the sequence are decreasing using the comparison method above. (Note that we're sorting '<span style="color:red;">^</span>' as succeeding other characters.) "<span style="color:red;">^</span>BANANA" becomes (<span style="color:red;">^</span>) (B) (AN) (AN) (A). {| class="wikitable" |- ! colspan="5" | Bijective transformation |- ! Input ! All<br />rotations ! Sorted alphabetically ! Last column<br />of rotated Lyndon word ! Output |- | align=center | <span style="color:red;">^</span>BANANA<span style="color:red;">$</span> | '''<span style="color:red;">^</span>'''<span style="color:red;">^</span><span style="color:red;">^</span><span style="color:red;">^</span><span style="color:red;">^</span><span style="color:red;">^</span><span style="color:red;">^</span><span style="color:red;">^</span>... (<span style="color:red;">^</span>) '''B'''BBBBBBB... (B) '''ANAN'''ANAN... (AN) '''NANA'''NANA... (NA) '''ANAN'''ANAN... (AN) '''NANA'''NANA... (NA) '''A'''AAAAAAA... (A) | '''A'''AAAAAAA... (A) '''A'''NANANAN... (AN) '''A'''NANANAN... (AN) '''B'''BBBBBBB... (B) '''N'''ANANANA... (NA) '''N'''ANANANA... (NA) '''<span style="color:red;">^</span>'''<span style="color:red;">^</span><span style="color:red;">^</span><span style="color:red;">^</span><span style="color:red;">^</span><span style="color:red;">^</span><span style="color:red;">^</span><span style="color:red;">^</span>... (<span style="color:red;">^</span>) | '''A'''AAAAAAA... ('''A''') A'''N'''ANANAN... (A'''N''') A'''N'''ANANAN... (A'''N''') '''B'''BBBBBBB... ('''B''') N'''A'''NANANA... (N'''A''') N'''A'''NANANA... (N'''A''') '''<span style="color:red;">^</span>'''<span style="color:red;">^</span><span style="color:red;">^</span><span style="color:red;">^</span><span style="color:red;">^</span><span style="color:red;">^</span><span style="color:red;">^</span><span style="color:red;">^</span>... ('''<span style="color:red;">^</span>''') | ANNBAA<span style="color:red;">^</span><span style="color:red;">$</span> |} {| class="wikitable" ! colspan=4 | Inverse bijective transform |- ! colspan=4 | Input |- | align=center colspan=4 | ANNBAA<span style="color:red;">^</span> |- ! Add 1 ! Sort 1 ! Add 2 ! Sort 2 |- | align=right | A N N B A A <span style="color:red;">^</span> | align=right | A A A B N N <span style="color:red;">^</span> | align=right | AA NA NA BB AN AN <span style="color:red;">^</span><span style="color:red;">^</span> | align=right | AA AN AN BB NA NA <span style="color:red;">^</span><span style="color:red;">^</span> |- ! Add 3 ! Sort 3 ! Add 4 ! Sort 4 |- | align=right | AAA NAN NAN BBB ANA ANA <span style="color:red;">^</span><span style="color:red;">^</span><span style="color:red;">^</span> | align=right | AAA ANA ANA BBB NAN NAN <span style="color:red;">^</span><span style="color:red;">^</span><span style="color:red;">^</span> | align=right | AAAA NANA NANA BBBB ANAN ANAN <span style="color:red;">^</span><span style="color:red;">^</span><span style="color:red;">^</span><span style="color:red;">^</span> | align=right | AAAA ANAN ANAN BBBB NANA NANA <span style="color:red;">^</span><span style="color:red;">^</span><span style="color:red;">^</span><span style="color:red;">^</span> |- ! colspan=4 | Output |- | align=center colspan=4 | <span style="color:red;">^</span>BANANA |} Up until the last step, the process is identical to the inverse Burrows–Wheeler process, but here it will not necessarily give rotations of a single sequence; it instead gives rotations of Lyndon words (which will start to repeat as the process is continued). Here, we can see (repetitions of) four distinct Lyndon words: (A), (AN) (twice), (B), and (<span style="color:red;">^</span>). (NANA... doesn't represent a distinct word, as it is a cycle of ANAN....) At this point, these words are sorted into reverse order: (<span style="color:red;">^</span>), (B), (AN), (AN), (A). These are then concatenated to get :<span style="color:red;">^</span>BANANA The Burrows–Wheeler transform can indeed be viewed as a special case of this bijective transform; instead of the traditional introduction of a new letter from outside our alphabet to denote the end of the string, we can introduce a new letter that compares as preceding all existing letters that is put at the beginning of the string. The whole string is now a Lyndon word, and running it through the bijective process will therefore result in a transformed result that, when inverted, gives back the Lyndon word, with no need for reassembling at the end. For example, applying the bijective transform gives: {| class="wikitable" ! Input | <code>SIX.MIXED.PIXIES.SIFT.SIXTY.PIXIE.DUST.BOXES</code> |- ! Lyndon words | <code><span style="color:#990000;">S</span><span style="color:#FF9900;">IX</span><span style="color:#006600;">.MIXED.PIXIES.SIFT.SIXTY.PIXIE</span><span style="color:#0000DD;">.DUST</span><span style="color:#660066;">.BOXES</span></code> |- ! Output | <code>STEYDST.E.IXXIIXXSMPPXS.B..EE..SUSFXDIOIIIIT</code> |} The bijective transform includes eight runs of identical characters. These runs are, in order: <code>XX</code>, <code>II</code>, <code>XX</code>, <code>PP</code>, <code>..</code>, <code>EE</code>, <code>..</code>, and <code>IIII</code>. In total, 18 characters are used in these runs.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)