Editing Shannon's source coding theorem (section)

{{short description|Establishes the limits to possible data compression}}
{{Information theory}}

{{about|the theory of source coding in data compression|the term in computer programming|Source code}}

In [[information theory]], '''Shannon's source coding theorem''' (or '''noiseless coding theorem''') establishes the statistical limits to possible [[data compression]] for data whose source is an [[independent identically-distributed random variables|independent identically-distributed random variable]], and the operational meaning of the [[Shannon entropy]].

Named after [[Claude Shannon]], the '''source coding theorem''' shows that, in the limit, as the length of a stream of [[independent and identically distributed random variables|independent and identically-distributed random variable (i.i.d.)]] data tends to infinity, it is impossible to compress such data such that the code rate (average number of bits per symbol) is less than the Shannon entropy of the source, without it being virtually certain that information will be lost. However it is possible to get the code rate arbitrarily close to the Shannon entropy, with negligible probability of loss.

The '''source coding theorem for symbol codes''' places an upper and a lower bound on the minimal possible expected length of codewords as a function of the [[Entropy (information theory)|entropy]] of the input word (which is viewed as a [[random variable]]) and of the size of the target alphabet.

Note that, for data that exhibits more dependencies (whose source is not an i.i.d. random variable), the [[Kolmogorov complexity]], which quantifies the minimal description length of an object, is more suitable to describe the limits of data compression. Shannon entropy takes into account only frequency regularities while Kolmogorov complexity takes into account all algorithmic regularities, so in general the latter is smaller. On the other hand, if an object is generated by a random process in such a way that it has only frequency regularities, entropy is close to complexity with high probability (Shen et al. 2017).<ref name="Shen2017"/>