Editing Shannon's source coding theorem (section)

== Statements ==

''Source coding'' is a mapping from (a sequence of) symbols from an information [[Information theory#Source theory|source]] to a sequence of alphabet symbols (usually bits) such that the source symbols can be exactly recovered from the binary bits (lossless source coding) or recovered within some distortion (lossy source coding). This is one approach to [[data compression]].

=== Source coding theorem ===
In information theory, the source coding theorem (Shannon 1948)<ref name="Shannon"/> informally states that (MacKay 2003, pg. 81,<ref name="MacKay"/> Cover 2006, Chapter 5<ref name="Cover"/>):

<blockquote>{{mvar|N}} [[Independent and identically distributed random variables|i.i.d.]] random variables each with entropy {{math|''H''(''X'')}} can be compressed into more than {{math|''N&thinsp;H''(''X'')}} [[bit]]s with negligible risk of information loss, as {{math|''N'' → ∞}}; but conversely, if they are compressed into fewer than {{math|''N&thinsp;H''(''X'')}} bits it is virtually certain that information will be lost.</blockquote>The <math>NH(X)</math> coded sequence represents the compressed message in a biunivocal way, under the assumption that the decoder knows the source. From a practical point of view, this hypothesis is not always true. Consequently, when the entropy encoding is applied the transmitted message is <math>NH(X)+(inf. source)</math>. Usually, the information that characterizes the source is inserted at the beginning of the transmitted message.

=== Source coding theorem for symbol codes ===
Let {{math|Σ<sub>1</sub>, Σ<sub>2</sub>}} denote two finite alphabets and let {{math|Σ{{su|b=1|p=∗}}}} and {{math|Σ{{su|b=2|p=∗}}}} denote the [[Kleene star|set of all finite words]] from those alphabets (respectively).

Suppose that {{mvar|X}} is a random variable taking values in {{math|Σ<sub>1</sub>}} and let {{math|&thinsp;''f''&thinsp;}} be a [[Variable-length code#Uniquely decodable codes|uniquely decodable]] code from {{math|Σ{{su|b=1|p=∗}}}} to {{math|Σ{{su|b=2|p=∗}}}} where {{math|{{!}}Σ<sub>2</sub>{{!}} {{=}} ''a''}}. Let {{mvar|S}} denote the random variable given by the length of codeword {{math|&thinsp;''f''&thinsp;(''X'')}}.

If {{math|&thinsp;''f''&thinsp;}} is optimal in the sense that it has the minimal expected word length for {{mvar|X}}, then (Shannon 1948):

:<math> \frac{H(X)}{\log_2 a} \leq \mathbb{E}[S] < \frac{H(X)}{\log_2 a} +1 </math>

Where <math>\mathbb{E}</math> denotes the [[expected value]] operator.