Editing Arithmetic coding (section)

===Huffman coding===
{{Main|Huffman coding}}
Because arithmetic coding doesn't compress one datum at a time, it can get arbitrarily close to entropy when compressing [[Independent and identically distributed random variables|IID]] strings. By contrast, using the extension of [[Huffman coding]] (to strings) does not reach entropy unless all probabilities of alphabet symbols are powers of two, in which case both Huffman and arithmetic coding achieve entropy.

When naively Huffman coding binary strings, no compression is possible, even if entropy is low (e.g. ({0, 1}) has probabilities {0.95, 0.05}). Huffman encoding assigns 1 bit to each value, resulting in a code of the same length as the input. By contrast, arithmetic coding compresses bits well, approaching the optimal compression ratio of

:<math> 1 - [-0.95 \log_2(0.95) + -0.05 \log_2(0.05)] \approx 71.4\%.</math>

One simple way to address Huffman coding's suboptimality is to concatenate symbols ("blocking") to form a new alphabet in which each new symbol represents a sequence of original symbols – in this case bits – from the original alphabet. In the above example, grouping sequences of three symbols before encoding would produce new "super-symbols" with the following frequencies:
* {{samp|000}}: 85.7%
* {{samp|001}}, {{samp|010}}, {{samp|100}}: 4.5% each
* {{samp|011}}, {{samp|101}}, {{samp|110}}: 0.24% each
* {{samp|111}}: 0.0125%

With this grouping, Huffman coding averages 1.3 bits for every three symbols, or 0.433 bits per symbol, compared with one bit per symbol in the original encoding, i.e., <math>56.7\%</math> compression. Allowing arbitrarily large sequences gets arbitrarily close to entropy – just like arithmetic coding – but requires huge codes to do so, so is not as practical as arithmetic coding for this purpose.

An alternative is [[Run-length encoding|encoding run lengths]] via Huffman-based [[Golomb coding|Golomb-Rice codes]]. Such an approach allows simpler and faster encoding/decoding than arithmetic coding or even Huffman coding, since the latter requires a table lookups. In the {0.95, 0.05} example, a Golomb-Rice code with a four-bit remainder achieves a compression ratio of <math>71.1\%</math>, far closer to optimum than using three-bit blocks. Golomb-Rice codes only apply to [[Bernoulli process|Bernoulli]] inputs such as the one in this example, however, so it is not a substitute for blocking in all cases.