Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Arithmetic coding
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
===Theoretical limit of compressed message=== The lower bound ''L'' never exceeds ''n''<sup>''n''</sup>, where ''n'' is the size of the message, and so can be represented in <math>\log_2(n^n) = n \log_2(n)</math> bits. After the computation of the upper bound ''U'' and the reduction of the message by selecting a number from the interval [''L'', ''U'') with the longest trail of zeros we can presume that this length can be reduced by <math>\textstyle \log_2\left(\prod_{k=1}^n f_k\right)</math> bits. Since each frequency in a product occurs exactly the same number of times as the value of this frequency, we can use the size of the alphabet ''A'' for the computation of the product :<math> \prod_{k=1}^n f_k = \prod_{k=1}^A f_k^{f_k}.</math> Applying log<sub>2</sub> for the estimated number of bits in the message, the final message (not counting a logarithmic overhead for the message length and frequency tables) will match the number of bits given by [[entropy (information theory)|entropy]], which for long messages is very close to optimal: :<math>-\left[\sum_{i=1}^A f_i \log_2(f_i)\right] n = n H</math> In other words, the efficiency of arithmetic encoding approaches the theoretical limit of <math>H</math> bits per symbol, as the message length approaches infinity. ==== Asymptotic equipartition ==== We can understand this intuitively. Suppose the source is ergodic, then it has the [[asymptotic equipartition property]] (AEP). By the AEP, after a long stream of <math>n</math> symbols, the interval of <math>(0, 1)</math> is almost partitioned into almost equally-sized intervals. Technically, for any small <math>\epsilon > 0</math>, for all large enough <math>n</math>, there exists <math>2^{nH(X)(1+O(\epsilon))}</math> strings <math>x_{1:n}</math>, such that each string has almost equal probability <math>Pr(x_{1:n}) = 2^{-nH(X)(1+ O(\epsilon))} </math>, and their total probability is <math>1-O(\epsilon)</math>. For any such string, it is arithmetically encoded by a binary string of length <math>k</math>, where <math>k</math> is the smallest <math>k</math> such that there exists a fraction of form <math>\frac{?}{2^k}</math> in the interval for <math>x_{1:n}</math>. Since the interval for <math>x_{1:n}</math> has size <math>2^{-nH(X)(1+ O(\epsilon))} </math>, we should expect it to contain one fraction of form <math>\frac{?}{2^k}</math> when <math>k = nH(X)(1+O(\epsilon))</math>. Thus, with high probability, <math>x_{1:n}</math> can be arithmetically encoded with a binary string of length <math>nH(X) ( 1 + O(\epsilon))</math>.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)