Editing Binary symmetric channel (section)

== Noisy-channel coding theorem ==
Shannon's [[noisy-channel coding theorem]] gives a result about the rate of information that can be transmitted through a communication channel with arbitrarily low error. We study the particular case of <math>\text{BSC}_p</math>.

The noise <math>e</math> that characterizes <math>\text{BSC}_{p}</math> is a [[random variable]] consisting of n independent random bits (n is defined below) where each random bit is a <math>1</math> with probability <math>p</math> and a <math>0</math> with probability <math>1-p</math>. We indicate this by writing "<math>e \in \text{BSC}_{p}</math>".

{{math_theorem|For all <math>p< \tfrac{1}{2},</math> all <math>0 < \epsilon < \tfrac{1}{2} - p</math>, all sufficiently large <math>n</math> (depending on <math>p</math> and <math>\epsilon</math>), and all <math>k\leq\lfloor (1 - H(p + \epsilon))n\rfloor</math>, there exists a pair of encoding and decoding functions <math>E: \{0,1\}^k \to \{0,1\}^n</math> and <math>D: \{0,1\}^n \to \{0,1\}^{k}</math> respectively,  such that every message <math>m\in\{0,1\}^{k}</math> has the following property: 
::<math>\Pr_{e \in \text{BSC}_p}[D(E(m) + e) \neq m] \leq 2^{-{\delta}n}</math>.}}

What this theorem actually implies is, a message when picked from <math>\{0,1\}^k</math>, encoded with a random encoding function <math>E</math>, and sent across a noisy <math>\text{BSC}_{p}</math>, there is a very high probability of recovering the original message by decoding, if <math>k</math> or in effect the rate of the channel is bounded by the quantity stated in the theorem. The decoding error probability is exponentially small.

===Proof===
The theorem can be proved directly with a [[probabilistic method]]. Consider an encoding function <math>E: \{0,1\}^k \to \{0,1\}^n</math> that is selected at random. This means that for each message <math>m \in \{0,1\}^k</math>, the value <math>E(m) \in \{0,1\}^n</math> is selected at random (with equal probabilities). For a given encoding function <math>E</math>, the decoding function <math>D:\{0,1\}^n \to \{0,1\}^k</math> is specified as follows: given any received codeword <math>y \in \{0,1\}^n</math>, we find the message <math>m\in\{0,1\}^{k}</math> such that the [[Hamming distance]] <math>\Delta(y, E(m))</math> is as small as possible (with ties broken arbitrarily). (<math>D</math> is called a [[decoding methods#maximum likelihood decoding|maximum likelihood decoding]] function.)

The proof continues by showing that at least one such choice <math>(E,D)</math> satisfies the conclusion of theorem, by integration over the probabilities. Suppose <math>p</math> and <math>\epsilon</math> are fixed. First we show that, for a fixed <math>m \in \{0,1\}^{k}</math> and <math>E</math> chosen randomly, the probability of failure over <math>\text{BSC}_p</math> noise is exponentially small in ''n''. At this point, the proof works for a fixed message <math>m</math>. Next we extend this result to work for all messages <math>m</math>. We achieve this by eliminating half of the codewords from the code with the argument that the proof for the decoding error probability holds for at least half of the codewords. The latter method is called expurgation. This gives the total process the name ''random coding with expurgation''.

:{| class="toccolours collapsible collapsed" width="80%" style="text-align:left"
!Continuation of proof (sketch)
|-
|Fix <math>p</math> and <math>\epsilon</math>. Given a fixed message <math>m \in \{0,1\}^{k}</math>, we need to estimate the [[expected value]] of the [[probability]] of the received codeword along with the noise does not give back <math>m</math> on decoding. That is to say, we need to estimate:

:<math>\mathbb{E}_{E} \left [\Pr_{e \in \text{BSC}_p}[D(E(m) + e) \neq m] \right ].</math>

Let <math>y</math> be the received codeword. In order for the decoded codeword <math>D(y)</math> not to be equal to the message <math>m</math>, one of the following events must occur:

* <math>y</math> does not lie within the [[Hamming ball]] of radius <math>(p+\epsilon)n</math> centered at <math>E(m)</math>. This condition is mainly used to make the calculations easier.
* There is another message <math>m' \in \{0,1\}^k</math> such that <math>\Delta(y, E(m')) \leqslant \Delta(y, E(m))</math>. In other words, the errors due to noise take the transmitted codeword closer to another encoded message.

We can apply the [[Chernoff bound]] to ensure the non occurrence of the first event; we get:

:<math>Pr_{e \in \text{BSC}_p} [\Delta(y, E(m)) > (p+\epsilon)n] \leqslant 2^{-{\epsilon^2}n}.</math>

This is exponentially small for large <math>n</math> (recall that <math>\epsilon</math> is fixed).

For the second event, we note that the probability that <math>E(m') \in B(y,(p+\epsilon)n)</math> is <math>\text{Vol}(B(y,(p+\epsilon)n)/2^n</math> where <math>B(x, r)</math> is the Hamming ball of radius <math>r</math> centered at vector <math>x</math> and <math>\text{Vol}(B(x, r))</math> is its volume. Using approximation to estimate the number of codewords in the Hamming ball, we have <math>\text{Vol}(B(y,(p+\epsilon)n)) \approx 2^{H(p)n}</math>. Hence the above probability amounts to <math>2^{H(p)n}/2^n = 2^{H(p)n-n}</math>. Now using the [[union bound]], we can upper bound the existence of such an <math>m' \in \{0,1\}^k</math> by <math>\le 2^{k +H(p)n-n}</math> which is <math>2^{-\Omega(n)}</math>, as desired by the choice of <math>k</math>.
|}

:{| class="toccolours collapsible collapsed" width="80%" style="text-align:left"
!Continuation of proof (detailed)
|-
|From the above analysis, we calculate the probability of the event that the decoded codeword plus the channel noise is not the same as the original message sent. We shall introduce some symbols here. Let <math>p(y|E(m))</math> denote the probability of receiving codeword <math>y</math> given that codeword <math>E(m)</math> was sent. Let <math>B_0</math> denote <math>B(E(m),(p+\epsilon)n).</math>

:<math>\begin{align}
\Pr_{e \in \text{BSC}_p}[D(E(m) + e) \neq m] &= \sum_{y \in \{0,1\}^{n}} p(y|E(m))\cdot 1_{D(y)\neq m} \\
&\leqslant \sum_{y \notin B_0} p(y|E(m)) \cdot 1_{D(y)\neq m} + \sum_{y \in B_0} p(y|E(m))\cdot 1_{D(y)\neq m} \\
&\leqslant 2^{-{\epsilon^2}n} + \sum_{y \in B_0} p(y|E(m)) \cdot 1_{D(y)\neq m}
\end{align}</math>

We get the last inequality by our analysis using the [[Chernoff bound]] above. Now taking expectation on both sides we have,

:<math>\begin{align}
\mathbb{E}_E \left [\Pr_{e \in \text{BSC}_p} [D(E(m) + e) \neq m] \right ] &\leqslant 2^{-{\epsilon^2}n} + \sum_{y \in B_0} p(y|E(m)) \mathbb{E}[1_{D(y)\neq m}] \\
&\leqslant 2^{-{\epsilon^2}n} + \sum_{y \in B_0} \mathbb{E}[1_{D(y)\neq m}]  && \sum_{y \in B_0} p(y|E(m)) \leqslant 1 \\
&\leqslant 2^{-{\epsilon^2}n} + 2^{k +H(p + \epsilon)n-n} && \mathbb{E}[1_{D(y)\neq m}] \leqslant 2^{k +H(p + \epsilon)n-n} \text{ (see above)} \\
&\leqslant 2^{-\delta n}
\end{align}</math>

by appropriately choosing the value of <math>\delta</math>. Since the above bound holds for '''each''' message, we have

:<math>\mathbb{E}_m \left [\mathbb{E}_E \left [\Pr_{e \in \text{BSC}_p} \left [D(E(m) + e) \right ] \neq m \right ] \right ] \leqslant 2^{-\delta n}.</math>

Now we can change the order of summation in the expectation with respect to the message and the choice of the encoding function <math>E</math>. Hence:

:<math>\mathbb{E}_E \left [\mathbb{E}_m \left [\Pr_{e \in \text{BSC}_p} \left [D(E(m) + e) \right ] \neq m \right ] \right ] \leqslant 2^{-\delta n}.</math>

Hence in conclusion, by probabilistic method, we have some encoding function <math>E^{*}</math> and a corresponding decoding function <math>D^{*}</math> such that

:<math>\mathbb{E}_m \left [\Pr_{e \in \text{BSC}_p} \left [D^{*}(E^{*}(m) + e)\neq m \right ] \right ] \leqslant 2^{-\delta n}.</math>

At this point, the proof works for a fixed message <math>m</math>. But we need to make sure that the above bound holds for '''all''' the messages <math>m</math> '''simultaneously'''. For that, let us sort the <math>2^k</math> messages by their decoding error probabilities. Now by applying [[Markov's inequality]], we can show the decoding error probability for the first <math>2^{k-1}</math> messages to be at most <math>2\cdot 2^{-\delta n}</math>. Thus in order to confirm that the above bound to hold for '''every''' message <math>m</math>, we could just trim off the last <math>2^{k-1}</math> messages from the sorted order. This essentially gives us another encoding function <math>E'</math> with a corresponding decoding function <math>D'</math> with a decoding error probability of at most <math>2^{-\delta n + 1}</math> with the same rate. Taking <math>\delta'</math> to be equal to <math>\delta - \tfrac{1}{n}</math> we bound the decoding error probability to <math>2^{-\delta'n}</math>. This expurgation process completes the proof.
|}