Editing Shannon's source coding theorem (section)

== Proof: source coding theorem ==
Given {{mvar|X}} is an [[independent identically-distributed random variables|i.i.d.]] source, its [[time series]] {{math|''X''<sub>1</sub>, ..., ''X<sub>n</sub>''}} is i.i.d. with [[Entropy_(information_theory)|entropy]] {{math|''H''(''X'')}} in the discrete-valued case and [[differential entropy]] in the continuous-valued case. The Source coding theorem states that for any {{math|''ε'' > 0}}, i.e. for any [[information theory#Rate|rate]] {{math|''H''(''X'') + ''ε''}} larger than the [[entropy]] of the source, there is large enough {{mvar|n}} and an encoder that takes {{mvar|n}} i.i.d. repetition of the source, {{math|''X''<sup>1:''n''</sup>}}, and maps it to {{math|''n''(''H''(''X'') + ''ε'')}} binary bits such that the source symbols {{math|''X''<sup>1:''n''</sup>}} are recoverable from the binary bits with probability of at least {{math|1 − ''ε''}}.

'''Proof of Achievability.''' Fix some {{math|''ε'' > 0}}, and let

:<math>p(x_1, \ldots, x_n) = \Pr \left[X_1 = x_1, \cdots, X_n = x_n \right].</math>

The [[typical set]], {{math|''A''{{su|b=''n''|p=''ε''}}}}, is defined as follows:

:<math>A_n^\varepsilon =\left\{(x_1, \cdots, x_n) \ : \ \left|-\frac{1}{n} \log p(x_1, \cdots, x_n) - H_n(X)\right| < \varepsilon \right\}.</math>

The [[Asymptotic equipartition property#AEP for discrete-time i.i.d. sources|asymptotic equipartition property]] (AEP) shows that for large enough {{mvar|n}}, the probability that a sequence generated by the source lies in the typical set, {{math|''A''{{su|b=''n''|p=''ε''}}}}, as defined approaches one. In particular, for sufficiently large {{mvar|n}}, <math>P((X_1,X_2,\cdots,X_n) \in A_n^\varepsilon)</math> can be made arbitrarily close to 1, and specifically, greater than <math>1-\varepsilon</math> (See 
[[Asymptotic equipartition property#AEP for discrete-time i.i.d. sources|AEP]] for a proof).

The definition of typical sets implies that those sequences that lie in the typical set satisfy:

:<math>2^{-n(H(X)+\varepsilon)} \leq p \left (x_1, \cdots, x_n \right ) \leq 2^{-n(H(X)-\varepsilon)}</math>


*The probability of a sequence <math>(X_1,X_2,\cdots X_n)</math> being drawn from {{math|''A''{{su|b=''n''|p=''ε''}}}} is greater than {{math|1 − ''ε''}}.
*<math>\left| A_n^\varepsilon \right| \leq 2^{n(H(X)+\varepsilon)}</math>, which follows from the left hand side (lower bound) for <math> p(x_1,x_2,\cdots x_n)</math>.
*<math>\left| A_n^\varepsilon \right| \geq (1-\varepsilon) 2^{n(H(X)-\varepsilon)}</math>, which follows from upper bound for <math> p(x_1,x_2,\cdots x_n)</math>  and the lower bound on the total probability of the whole set {{math|''A''{{su|b=''n''|p=''ε''}}}}.

Since <math>\left| A_n^\varepsilon \right| \leq 2^{n(H(X)+\varepsilon)}, n(H(X)+\varepsilon)</math> bits are enough to point to any string in this set.

The encoding algorithm: the encoder checks if the input sequence lies within the typical set; if yes, it outputs the index of the input sequence within the typical set; if not, the encoder outputs an arbitrary {{math|''n''(''H''(''X'') + ''ε'')}} digit number. As long as the input sequence lies within the typical set (with probability at least {{math|1 − ''ε''}}), the encoder does not make any error. So, the probability of error of the encoder is bounded above by {{mvar|ε}}.

''Proof of converse'': the converse is proved by showing that any set of size smaller than {{math|''A''{{su|b=''n''|p=''ε''}}}} (in the sense of exponent) would cover a set of probability bounded away from {{math|1}}.