Editing Information theory (section)

==Coding theory==
{{Unreferenced section|date=April 2024}}{{Main|Coding theory}}

[[File:CDSCRATCHES.jpg|thumb|right|A picture showing scratches on the readable surface of a CD-R. Music and data CDs are coded using error correcting codes and thus can still be read even if they have minor scratches using [[error detection and correction]].]]

Coding theory is one of the most important and direct applications of information theory. It can be subdivided into source coding theory and channel coding theory. Using a statistical description for data, information theory quantifies the number of bits needed to describe the data, which is the information entropy of the source.

* [[Data compression]] (source coding): There are two formulations for the compression problem:
** [[lossless data compression]]: the data must be reconstructed exactly;
** [[lossy data compression]]: allocates bits needed to reconstruct the data, within a specified fidelity level measured by a distortion function. This subset of information theory is called ''[[rate–distortion theory]]''.
* [[Error-correcting code]]s (channel coding): While data compression removes as much redundancy as possible, an error-correcting code adds just the right kind of redundancy (i.e., error correction) needed to transmit the data efficiently and faithfully across a noisy channel.

This division of coding theory into compression and transmission is justified by the information transmission theorems, or source–channel separation theorems that justify the use of bits as the universal currency for information in many contexts. However, these theorems only hold in the situation where one transmitting user wishes to communicate to one receiving user. In scenarios with more than one transmitter (the multiple-access channel), more than one receiver (the [[broadcast channel]]) or intermediary "helpers" (the [[relay channel]]), or more general [[computer network|networks]], compression followed by transmission may no longer be optimal.

===Source theory===
Any process that generates successive messages can be considered a {{em|[[Communication source|source]]}} of information. A memoryless source is one in which each message is an [[Independent identically distributed random variables|independent identically distributed random variable]], whereas the properties of [[ergodic theory|ergodicity]] and [[stationary process|stationarity]] impose less restrictive constraints. All such sources are [[stochastic process|stochastic]]. These terms are well studied in their own right outside information theory.

====Rate====<!-- This section is linked from [[Channel capacity]] -->
Information ''[[Entropy rate|rate]]'' is the average entropy per symbol. For memoryless sources, this is merely the entropy of each symbol, while, in the case of a stationary stochastic process, it is:

:<math>r = \lim_{n \to \infty} H(X_n|X_{n-1},X_{n-2},X_{n-3}, \ldots);</math>

that is, the conditional entropy of a symbol given all the previous symbols generated. For the more general case of a process that is not necessarily stationary, the ''average rate'' is:

:<math>r = \lim_{n \to \infty} \frac{1}{n} H(X_1, X_2, \dots X_n);</math>

that is, the limit of the joint entropy per symbol. For stationary sources, these two expressions give the same result.<ref>{{cite book | title = Digital Compression for Multimedia: Principles and Standards | author = Jerry D. Gibson | publisher = Morgan Kaufmann | year = 1998 | url = https://books.google.com/books?id=aqQ2Ry6spu0C&q=entropy-rate+conditional&pg=PA56 | isbn = 1-55860-369-7 }}</ref>

The [[information rate]] is defined as:  
:<math>r = \lim_{n \to \infty} \frac{1}{n} I(X_1, X_2, \dots X_n;Y_1,Y_2, \dots Y_n);</math>

It is common in information theory to speak of the "rate" or "entropy" of a language. This is appropriate, for example, when the source of information is English prose. The rate of a source of information is related to its redundancy and how well it can be compressed, the subject of {{em|source coding}}.

===Channel capacity===
{{Main|Channel capacity}}

Communications over a channel is the primary motivation of information theory. However, channels often fail to produce exact reconstruction of a signal; noise, periods of silence, and other forms of signal corruption often degrade quality.

Consider the communications process over a discrete channel. A simple model of the process is shown below:

:<math title="Channel model">
\xrightarrow[\text{Message}]{W}
\begin{array}{ |c| }\hline \text{Encoder} \\ f_n \\ \hline\end{array} \xrightarrow[\mathrm{Encoded \atop sequence}]{X^n} \begin{array}{ |c| }\hline \text{Channel} \\ p(y|x) \\ \hline\end{array} \xrightarrow[\mathrm{Received \atop sequence}]{Y^n} \begin{array}{ |c| }\hline \text{Decoder} \\ g_n \\ \hline\end{array} \xrightarrow[\mathrm{Estimated \atop message}]{\hat W}</math>

Here ''X'' represents the space of messages transmitted, and ''Y'' the space of messages received during a unit time over our channel. Let {{math|''p''(''y''{{pipe}}''x'')}} be the [[conditional probability]] distribution function of ''Y'' given ''X''. We will consider {{math|''p''(''y''{{pipe}}''x'')}} to be an inherent fixed property of our communications channel (representing the nature of the ''[[Signal noise|noise]]'' of our channel). Then the joint distribution of ''X'' and ''Y'' is completely determined by our channel and by our choice of {{math|''f''(''x'')}}, the marginal distribution of messages we choose to send over the channel. Under these constraints, we would like to maximize the rate of information, or the ''[[Signal (electrical engineering)|signal]]'', we can communicate over the channel. The appropriate measure for this is the mutual information, and this maximum mutual information is called the {{em|channel capacity}} and is given by:
:<math> C = \max_{f} I(X;Y).\! </math>
This capacity has the following property related to communicating at information rate ''R'' (where ''R'' is usually bits per symbol). For any information rate ''R'' < ''C'' and coding error ''ε'' > 0, for large enough ''N'', there exists a code of length ''N'' and rate ≥ R and a decoding algorithm, such that the maximal probability of block error is ≤ ''ε''; that is, it is always possible to transmit with arbitrarily small block error. In addition, for any rate ''R'' &gt; ''C'', it is impossible to transmit with arbitrarily small block error.

''[[Channel code|Channel coding]]'' is concerned with finding such nearly optimal codes that can be used to transmit data over a noisy channel with a small coding error at a rate near the channel capacity.

====Capacity of particular channel models====
* A continuous-time analog communications channel subject to [[Gaussian noise]]—see [[Shannon–Hartley theorem]].
* A [[binary symmetric channel]] (BSC) with crossover probability ''p'' is a binary input, binary output channel that flips the input bit with probability ''p''. The BSC has a capacity of {{math|1 &minus; ''H''<sub>b</sub>(''p'')}} bits per channel use, where {{math|''H''<sub>b</sub>}} is the binary entropy function to the base-2 logarithm:

::[[File:Binary symmetric channel.svg]]

* A [[binary erasure channel]] (BEC) with erasure probability ''p'' is a binary input, ternary output channel. The possible channel outputs are 0, 1, and a third symbol 'e' called an erasure. The erasure represents complete loss of information about an input bit. The capacity of the BEC is {{nowrap|1 &minus; ''p''}} bits per channel use.

::[[File:Binary erasure channel.svg]]

====Channels with memory and directed information====
In practice many channels have memory. Namely, at time <math> i </math> the channel is given by the conditional probability<math> P(y_i|x_i,x_{i-1},x_{i-2},...,x_1,y_{i-1},y_{i-2},...,y_1) </math>.
It is often more comfortable to use the notation <math> x^i=(x_i,x_{i-1},x_{i-2},...,x_1) </math> and the channel become <math> P(y_i|x^i,y^{i-1}) </math>.
In such a case the capacity is given by the [[mutual information]] rate when there is no feedback available and the [[Directed information]] rate in the case that either there is feedback or not<ref name=massey/><ref>{{cite journal |last1=Permuter |first1=Haim Henry |last2=Weissman |first2=Tsachy |last3=Goldsmith |first3=Andrea J. |title=Finite State Channels With Time-Invariant Deterministic Feedback |journal=IEEE Transactions on Information Theory |date=February 2009 |volume=55 |issue=2 |pages=644–662 |doi=10.1109/TIT.2008.2009849|arxiv=cs/0608070 |s2cid=13178 }}</ref> (if there is no feedback the directed information equals the mutual information).

===Fungible information===
'''Fungible information''' is the [[information]] for which the means of [[encoding]] is not important.<ref>{{cite journal|last=Bartlett|first=Stephen D. |author2=Rudolph, Terry|author3-link=Robert Spekkens|author3=Spekkens, Robert W.|title=Reference frames, superselection rules, and quantum information|journal=Reviews of Modern Physics|volume=79|issue=2|date=April–June 2007|pages=555–606|doi=10.1103/RevModPhys.79.555|bibcode=2007RvMP...79..555B|arxiv = quant-ph/0610030 }}</ref> Classical information theorists and computer scientists are mainly concerned with information of this sort. It is sometimes referred as speakable information.<ref>{{cite book | last = Peres | first = A.| title = Quantum Theory: Reconsideration of Foundations | publisher = Växjö University Press, Växjö, Sweden| year = 2002b|editor = A. Khrennikov|page=283 |author2=P. F. Scudo}}</ref>