Editing Burrows–Wheeler transform (section)

==Bijective variant==
Since any rotation of the input string will lead to the same transformed string, the BWT cannot be inverted without adding an EOF marker to the end of the input or doing something equivalent, making it possible to distinguish the input string from all its rotations.  Increasing the size of the alphabet (by appending the EOF character) makes later compression steps awkward.

There is a [[bijective]] version of the transform, by which the transformed string uniquely identifies the original, and the two have the same length and contain exactly the same characters, just in a different order.<ref>{{citation | last1 = Gil | first1 = J. | last2 = Scott | first2 = D. A. | title = A bijective string sorting transform | url = http://bijective.dogma.net/00yyy.pdf | year = 2009 | access-date = 2009-07-09 | archive-date = 2011-10-08 | archive-url = https://web.archive.org/web/20111008001603/http://bijective.dogma.net/00yyy.pdf | url-status = dead }}</ref><ref>{{citation | last = Kufleitner | first = Manfred | editor1-last = Holub | editor1-first = Jan | editor2-last = Žďárek | editor2-first = Jan | arxiv = 0908.0239 | contribution = On bijective variants of the Burrows–Wheeler transform | pages = 65–69 | title = Prague Stringology Conference | url = http://www.stringology.org/event/2009/p07.html | year = 2009| bibcode = 2009arXiv0908.0239K}}.</ref>

The bijective transform is computed by factoring the input into a non-increasing sequence of [[Lyndon word]]s; such a factorization exists and is unique by the [[Chen–Fox–Lyndon theorem]],<ref>*{{Citation | last=Lothaire | first=M. | author-link=M. Lothaire | others=Perrin, D.; Reutenauer, C.; Berstel, J.; Pin, J. E.; Pirillo, G.; Foata, D.; Sakarovitch, J.; Simon, I.; Schützenberger, M. P.; Choffrut, C.; Cori, R.; Lyndon, Roger; Rota, Gian-Carlo. Foreword by Roger Lyndon | title=Combinatorics on words | edition=2nd | series=Encyclopedia of Mathematics and Its Applications | volume=17 | publisher=[[Cambridge University Press]] | year=1997 | isbn=978-0-521-59924-5 | zbl=0874.20040 | page=67 }}</ref> and may be found in linear time and constant space.<ref>{{citation | last = Duval | first = Jean-Pierre | doi = 10.1016/0196-6774(83)90017-2 | issue = 4 | journal = Journal of Algorithms | pages = 363–381 | title = Factorizing words over an ordered alphabet | volume = 4 | year = 1983 | zbl=0532.68061| issn=0196-6774}}.</ref> The algorithm sorts the rotations of all the words; as in the Burrows–Wheeler transform, this produces a sorted sequence of ''n'' strings. The transformed string is then obtained by picking the final character of each string in this sorted list.  The one important caveat here is that strings of different lengths are not ordered in the usual way; the two strings are repeated forever, and the infinite repeats are sorted.  For example, "ORO" precedes "OR" because "OROORO..." precedes "OROROR...".

For example, the text "<span style="color:red;">^</span>BANANA<span style="color:red;">$</span>" is transformed into "ANNBAA<span style="color:red;">^</span><span style="color:red;">$</span>" through these steps (the red <span style="color:red;">$</span> character indicates the [[End-of-file|EOF]] pointer) in the original string. The EOF character is unneeded in the bijective transform, so it is dropped during the transform and re-added to its proper place in the file.

The string is broken into Lyndon words so the words in the sequence are decreasing using the comparison method above. (Note that we're sorting '<span style="color:red;">^</span>' as succeeding other characters.)  "<span style="color:red;">^</span>BANANA" becomes (<span style="color:red;">^</span>) (B) (AN) (AN) (A).

{| class="wikitable"
|-
!  colspan="5" | Bijective transformation
|-
! Input
! All<br />rotations
! Sorted alphabetically
! Last column<br />of rotated Lyndon word
! Output
|-
|  align=center |
 <span style="color:red;">^</span>BANANA<span style="color:red;">$</span>
|
 '''<span style="color:red;">^</span>'''<span style="color:red;">^</span><span style="color:red;">^</span><span style="color:red;">^</span><span style="color:red;">^</span><span style="color:red;">^</span><span style="color:red;">^</span><span style="color:red;">^</span>... (<span style="color:red;">^</span>)
 '''B'''BBBBBBB... (B)
 '''ANAN'''ANAN... (AN)
 '''NANA'''NANA... (NA)
 '''ANAN'''ANAN... (AN)
 '''NANA'''NANA... (NA)
 '''A'''AAAAAAA... (A)

|
 '''A'''AAAAAAA... (A)
 '''A'''NANANAN... (AN)
 '''A'''NANANAN... (AN)
 '''B'''BBBBBBB... (B)
 '''N'''ANANANA... (NA)
 '''N'''ANANANA... (NA)
 '''<span style="color:red;">^</span>'''<span style="color:red;">^</span><span style="color:red;">^</span><span style="color:red;">^</span><span style="color:red;">^</span><span style="color:red;">^</span><span style="color:red;">^</span><span style="color:red;">^</span>... (<span style="color:red;">^</span>)
|
 '''A'''AAAAAAA... ('''A''')
 A'''N'''ANANAN... (A'''N''')
 A'''N'''ANANAN... (A'''N''')
 '''B'''BBBBBBB... ('''B''')
 N'''A'''NANANA... (N'''A''')
 N'''A'''NANANA... (N'''A''')
 '''<span style="color:red;">^</span>'''<span style="color:red;">^</span><span style="color:red;">^</span><span style="color:red;">^</span><span style="color:red;">^</span><span style="color:red;">^</span><span style="color:red;">^</span><span style="color:red;">^</span>... ('''<span style="color:red;">^</span>''')
|
 ANNBAA<span style="color:red;">^</span><span style="color:red;">$</span>
|}
{| class="wikitable"
!  colspan=4 | Inverse bijective transform
|-
!  colspan=4 | Input
|-
|  align=center colspan=4 |
 ANNBAA<span style="color:red;">^</span>
|-
! Add 1
! Sort 1
! Add 2
! Sort 2
|-
|  align=right |
 A
 N
 N
 B
 A
 A
 <span style="color:red;">^</span>

|  align=right |
 A
 A
 A
 B
 N
 N
 <span style="color:red;">^</span>
|  align=right |
 AA
 NA
 NA
 BB
 AN
 AN
 <span style="color:red;">^</span><span style="color:red;">^</span>
|  align=right |
 AA
 AN
 AN
 BB
 NA
 NA
 <span style="color:red;">^</span><span style="color:red;">^</span>
|-
! Add 3
! Sort 3
! Add 4
! Sort 4
|-
|  align=right |

 AAA
 NAN
 NAN
 BBB
 ANA
 ANA
 <span style="color:red;">^</span><span style="color:red;">^</span><span style="color:red;">^</span>
|  align=right |
 AAA
 ANA
 ANA
 BBB
 NAN
 NAN
 <span style="color:red;">^</span><span style="color:red;">^</span><span style="color:red;">^</span>
|  align=right |
 AAAA
 NANA
 NANA
 BBBB
 ANAN
 ANAN
 <span style="color:red;">^</span><span style="color:red;">^</span><span style="color:red;">^</span><span style="color:red;">^</span>
|  align=right |
 AAAA
 ANAN
 ANAN
 BBBB
 NANA
 NANA
 <span style="color:red;">^</span><span style="color:red;">^</span><span style="color:red;">^</span><span style="color:red;">^</span>
|-
! colspan=4 | Output
|-
|  align=center colspan=4 |
 <span style="color:red;">^</span>BANANA
|}

Up until the last step, the process is identical to the inverse Burrows–Wheeler process, but here it will not necessarily give rotations of a single sequence; it instead gives rotations of Lyndon words (which will start to repeat as the process is continued).  Here, we can see (repetitions of) four distinct Lyndon words: (A), (AN) (twice), (B), and (<span style="color:red;">^</span>).  (NANA... doesn't represent a distinct word, as it is a cycle of ANAN....)
At this point, these words are sorted into reverse order: (<span style="color:red;">^</span>), (B), (AN), (AN), (A).  These are then concatenated to get
:<span style="color:red;">^</span>BANANA

The Burrows–Wheeler transform can indeed be viewed as a special case of this bijective transform; instead of the traditional introduction of a new letter from outside our alphabet to denote the end of the string, we can introduce a new letter that compares as preceding all existing letters that is put at the beginning of the string.  The whole string is now a Lyndon word, and running it through the bijective process will therefore result in a transformed result that, when inverted, gives back the Lyndon word, with no need for reassembling at the end.

For example, applying the bijective transform gives:
{| class="wikitable"
! Input
| <code>SIX.MIXED.PIXIES.SIFT.SIXTY.PIXIE.DUST.BOXES</code>
|-
! Lyndon words
| <code><span style="color:#990000;">S</span><span style="color:#FF9900;">IX</span><span style="color:#006600;">.MIXED.PIXIES.SIFT.SIXTY.PIXIE</span><span style="color:#0000DD;">.DUST</span><span style="color:#660066;">.BOXES</span></code>
|-
! Output
| <code>STEYDST.E.IXXIIXXSMPPXS.B..EE..SUSFXDIOIIIIT</code>
|}
The bijective transform includes eight runs of identical
characters. These runs are, in order: <code>XX</code>,
<code>II</code>,
<code>XX</code>,
<code>PP</code>,
<code>..</code>,
<code>EE</code>,
<code>..</code>,
and
<code>IIII</code>.

In total, 18 characters are used in these runs.