Editing Arithmetic coding (section)

===Theoretical limit of compressed message===
The lower bound ''L'' never exceeds ''n''<sup>''n''</sup>, where ''n'' is the size of the message, and so can be represented in <math>\log_2(n^n) = n \log_2(n)</math> bits. After the computation of the upper bound ''U'' and the reduction of the message by selecting a number from the interval [''L'',&nbsp;''U'') with the longest trail of zeros we can presume that this length can be reduced by <math>\textstyle \log_2\left(\prod_{k=1}^n f_k\right)</math> bits. Since each frequency in a product occurs exactly the same number of times as the value of this frequency, we can use the size of the alphabet ''A'' for the computation of the product

:<math> \prod_{k=1}^n f_k = \prod_{k=1}^A f_k^{f_k}.</math>

Applying log<sub>2</sub> for the estimated number of bits in the message, the final message (not counting a logarithmic overhead for the message length and frequency tables) will match the number of bits given by [[entropy (information theory)|entropy]], which for long messages is very close to optimal:

:<math>-\left[\sum_{i=1}^A f_i \log_2(f_i)\right] n = n H</math>
In other words, the efficiency of arithmetic encoding approaches the theoretical limit of <math>H</math> bits per symbol, as the message length approaches infinity.

==== Asymptotic equipartition ====
We can understand this intuitively. Suppose the source is ergodic, then it has the [[asymptotic equipartition property]] (AEP). By the AEP, after a long stream of <math>n</math> symbols, the interval of <math>(0, 1)</math> is almost partitioned into almost equally-sized intervals.

Technically, for any small <math>\epsilon > 0</math>, for all large enough <math>n</math>, there exists <math>2^{nH(X)(1+O(\epsilon))}</math> strings <math>x_{1:n}</math>, such that each string has almost equal probability <math>Pr(x_{1:n}) = 2^{-nH(X)(1+ O(\epsilon))} </math>, and their total probability is <math>1-O(\epsilon)</math>.

For any such string, it is arithmetically encoded by a binary string of length <math>k</math>, where <math>k</math> is the smallest <math>k</math> such that there exists a fraction of form <math>\frac{?}{2^k}</math> in the interval for <math>x_{1:n}</math>. Since the interval for <math>x_{1:n}</math> has size <math>2^{-nH(X)(1+ O(\epsilon))} </math>, we should expect it to contain one fraction of form <math>\frac{?}{2^k}</math> when <math>k = nH(X)(1+O(\epsilon))</math>.

Thus, with high probability, <math>x_{1:n}</math> can be arithmetically encoded with a binary string of length <math>nH(X) ( 1 + O(\epsilon))</math>.