Editing Range coding (section)

=== Example ===
Suppose we want to encode the message "AABA<EOM>", where <EOM> is the end-of-message symbol. For this example it is assumed that the decoder knows that we intend to encode exactly five symbols in the [[decimal|base 10 number system]] (allowing for 10<sup>5</sup> different combinations of symbols with the range <nowiki>[0, 100000)</nowiki>) using the [[probability distribution]] {A: .60; B: .20; <EOM>: .20}. The encoder breaks down the range <nowiki>[0, 100000)</nowiki> into three subranges:

 <nowiki>A:     [     0,  60000)</nowiki>
 <nowiki>B:     [ 60000,  80000)</nowiki>
 <nowiki><EOM>: [ 80000, 100000)</nowiki>

Since our first symbol is an A, it reduces our initial range down to <nowiki>[0, 60000)</nowiki>. The second symbol choice leaves us with three sub-ranges of this range. We show them following the already-encoded 'A':

 <nowiki>AA:     [     0,  36000)</nowiki>
 <nowiki>AB:     [ 36000,  48000)</nowiki>
 <nowiki>A<EOM>: [ 48000,  60000)</nowiki>

With two symbols encoded, our range is now <nowiki>[0, 36000)</nowiki> and our third symbol leads to the following choices:

 <nowiki>AAA:     [     0,  21600)</nowiki>
 <nowiki>AAB:     [ 21600,  28800)</nowiki>
 <nowiki>AA<EOM>: [ 28800,  36000)</nowiki>

This time it is the second of our three choices that represent the message we want to encode, and our range becomes <nowiki>[21600, 28800)</nowiki>. It may look harder to determine our sub-ranges in this case, but it is actually not: we can merely subtract the lower bound from the upper bound to determine that there are 7200 numbers in our range; that the first 4320 of them represent 0.60 of the total, the next 1440 represent the next 0.20, and the remaining 1440 represent the remaining 0.20 of the total. Adding back the lower bound gives us our ranges:

 <nowiki>AABA:     [21600, 25920)</nowiki>
 <nowiki>AABB:     [25920, 27360)</nowiki>
 <nowiki>AAB<EOM>: [27360, 28800)</nowiki>

Finally, with our range narrowed down to <nowiki>[21600, 25920)</nowiki>, we have just one more symbol to encode. Using the same technique as before for dividing up the range between the lower and upper bound, we find the three sub-ranges are:

 <nowiki>AABAA:     [21600, 24192)</nowiki>
 <nowiki>AABAB:     [24192, 25056)</nowiki>
 <nowiki>AABA<EOM>: [25056, 25920)</nowiki>

And since <EOM> is our final symbol, our final range is <nowiki>[25056, 25920)</nowiki>. Because all five-digit integers starting with "251" fall within our final range, it is one of the three-digit prefixes we could transmit that would unambiguously convey our original message. (The fact that there are actually eight such prefixes in all implies we still have inefficiencies. They have been introduced by our use of [[decimal|base 10]] rather than [[Binary numeral system|base 2]].)

The central problem may appear to be selecting an initial range large enough that no matter how many symbols we have to encode, we will always have a current range large enough to divide into non-zero sub-ranges. In practice, however, this is not a problem, because instead of starting with a very large range and gradually narrowing it down, the encoder works with a smaller range of numbers at any given time. After some number of digits have been encoded, the leftmost digits will not change. In the example after coding just three symbols, we already knew that our final result would start with "2". More digits are shifted in on the right as digits on the left are sent off. This is illustrated in the following code:

<syntaxhighlight lang="csharp">
int low = 0;
int range = 100000;

void Run()
{
	Encode(0, 6, 10);	// A
	Encode(0, 6, 10);	// A
	Encode(6, 2, 10);	// B
	Encode(0, 6, 10);	// A
	Encode(8, 2, 10);	// <EOM>

	// emit final digits - see below
	while (range < 10000)
		EmitDigit();

	low += 10000;
	EmitDigit();
}

void EmitDigit()
{
	Console.Write(low / 10000);
	low = (low % 10000) * 10;
	range *= 10;
}

void Encode(int start, int size, int total)
{
	// adjust the range based on the symbol interval
	range /= total;
	low += start * range;
	range *= size;

	// check if left-most digit is same throughout range
	while (low / 10000 == (low + range) / 10000)
		EmitDigit();

	// readjust range - see reason for this below
	if (range < 1000)
	{
		EmitDigit();
		EmitDigit();
		range = 100000 - low;
	}
}
</syntaxhighlight>

To finish off we may need to emit a few extra digits. The top digit of <code>low</code> is probably too small so we need to increment it, but we have to make sure we don't increment it past <code>low+range</code>. So first we need to make sure <code>range</code> is large enough.

<syntaxhighlight lang="csharp">
// emit final digits
while (range < 10000)
	EmitDigit();

low += 10000;
EmitDigit();
</syntaxhighlight>

One problem that can occur with the <code>Encode</code> function above is that <code>range</code> might become very small but <code>low</code> and <code>low+range</code> still have differing first digits. This could result in the interval having insufficient precision to distinguish between all of the symbols in the alphabet. When this happens we need to fudge a little, output the first couple of digits even though we might be off by one, and re-adjust the range to give us as much room as possible.

For example, imagine the input stream has led the encoder to the right-open interval [59888, 60188), that is, <code>low = 59888</code> and <code>range = 300</code>. The trick is to narrow down the interval to [59888, 60000) = ['''59'''888, '''59'''999], which allows the encoder to emit two of the left-most digits of <code>low</code>, and readjust the interval to [88800, 99999] = [88800, 100000), that is, <code>low = 88800</code> and <code>range = 100000 - low</code>.

The decoder will be following the same steps so it will know when it needs to do this to keep in sync.

<syntaxhighlight lang="csharp">
// this goes just before the end of Encode() above
if (range < 1000)
{
	EmitDigit();
	EmitDigit();
	range = 100000 - low;
}
</syntaxhighlight>

Base 10 was used in this example, but a real implementation would just use binary, with the full range of the native integer data type. Instead of <code>10000</code> and <code>1000</code> you would likely use hexadecimal constants such as <code>0x1000000</code> and <code>0x10000</code>. Instead of emitting a digit at a time you would emit a byte at a time and use a byte-shift operation instead of multiplying by 10.

Decoding uses exactly the same algorithm with the addition of keeping track of the current <code>code</code> value consisting of the digits read from the compressor. Instead of emitting the top digit of <code>low</code> you just throw it away, but you also shift out the top digit of <code>code</code> and shift in a new digit read from the compressor. Use <code>AppendDigit</code> below instead of <code>EmitDigit</code>.

<syntaxhighlight lang="csharp">
int code = 0;
int low = 0;
int range = 1;

void InitializeDecoder()
{
        AppendDigit();  // with this example code, only 1 of these is actually needed
        AppendDigit();
        AppendDigit();
        AppendDigit();
        AppendDigit();
}

void AppendDigit()
{
	code = (code % 10000) * 10 + ReadNextDigit();
	low = (low % 10000) * 10;
	range *= 10;
}

void Decode(int start, int size, int total)  // Decode is same as Encode with EmitDigit replaced by AppendDigit
{
	// adjust the range based on the symbol interval
	range /= total;
	low += start * range;
	range *= size;

	// check if left-most digit is same throughout range
	while (low / 10000 == (low + range) / 10000)
		AppendDigit();

	// readjust range - see reason for this below
	if (range < 1000)
	{
		AppendDigit();
		AppendDigit();
		range = 100000 - low;
	}
}
</syntaxhighlight>

In order to determine which probability intervals to apply, the decoder needs to look at the current value of <code>code</code> within the interval [low, low+range) and decide which symbol this represents.

<syntaxhighlight lang="csharp">
void Run()
{
	int start = 0;
	int size;
	int total = 10;
	InitializeDecoder();          // need to get range/total >0
	while (start < 8)             // stop when receive EOM
	{
		int v = GetValue(total);  // code is in what symbol range?
		switch (v)                // convert value to symbol
		{
			case 0:
			case 1:
			case 2:
			case 3:
			case 4:
			case 5: start=0; size=6; Console.Write("A"); break;
			case 6:
			case 7: start=6; size=2; Console.Write("B"); break;
			default: start=8; size=2; Console.WriteLine("");
		}
		Decode(start, size, total);
	}
}

int GetValue(int total)
{
	return (code - low) / (range / total);
}
</syntaxhighlight>

For the <nowiki>AABA<EOM></nowiki> example above, this would return a value in the range 0 to 9. Values 0 through 5 would represent A, 6 and 7 would represent B, and 8 and 9 would represent <nowiki><EOM></nowiki>.