Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Arithmetic coding
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
===Huffman coding=== {{Main|Huffman coding}} Because arithmetic coding doesn't compress one datum at a time, it can get arbitrarily close to entropy when compressing [[Independent and identically distributed random variables|IID]] strings. By contrast, using the extension of [[Huffman coding]] (to strings) does not reach entropy unless all probabilities of alphabet symbols are powers of two, in which case both Huffman and arithmetic coding achieve entropy. When naively Huffman coding binary strings, no compression is possible, even if entropy is low (e.g. ({0, 1}) has probabilities {0.95, 0.05}). Huffman encoding assigns 1 bit to each value, resulting in a code of the same length as the input. By contrast, arithmetic coding compresses bits well, approaching the optimal compression ratio of :<math> 1 - [-0.95 \log_2(0.95) + -0.05 \log_2(0.05)] \approx 71.4\%.</math> One simple way to address Huffman coding's suboptimality is to concatenate symbols ("blocking") to form a new alphabet in which each new symbol represents a sequence of original symbols β in this case bits β from the original alphabet. In the above example, grouping sequences of three symbols before encoding would produce new "super-symbols" with the following frequencies: * {{samp|000}}: 85.7% * {{samp|001}}, {{samp|010}}, {{samp|100}}: 4.5% each * {{samp|011}}, {{samp|101}}, {{samp|110}}: 0.24% each * {{samp|111}}: 0.0125% With this grouping, Huffman coding averages 1.3 bits for every three symbols, or 0.433 bits per symbol, compared with one bit per symbol in the original encoding, i.e., <math>56.7\%</math> compression. Allowing arbitrarily large sequences gets arbitrarily close to entropy β just like arithmetic coding β but requires huge codes to do so, so is not as practical as arithmetic coding for this purpose. An alternative is [[Run-length encoding|encoding run lengths]] via Huffman-based [[Golomb coding|Golomb-Rice codes]]. Such an approach allows simpler and faster encoding/decoding than arithmetic coding or even Huffman coding, since the latter requires a table lookups. In the {0.95, 0.05} example, a Golomb-Rice code with a four-bit remainder achieves a compression ratio of <math>71.1\%</math>, far closer to optimum than using three-bit blocks. Golomb-Rice codes only apply to [[Bernoulli process|Bernoulli]] inputs such as the one in this example, however, so it is not a substitute for blocking in all cases.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)