Editing Channel capacity (section)

== Additivity of channel capacity ==
Channel capacity is additive over independent channels.<ref>{{cite book |last1=Cover |first1=Thomas M. |last2=Thomas |first2=Joy A. |title=Elements of Information Theory |publisher=Wiley-Interscience |edition=Second |date=2006 |pages=206–207 |chapter=Chapter 7: Channel Capacity |isbn=978-0-471-24195-9}}</ref> It means that using two independent channels in a combined manner provides the same theoretical  capacity as using them independently. 
More formally, let <math>p_{1}</math> and <math>p_{2}</math> be two independent channels modelled as above; <math>p_{1}</math> having an input alphabet <math>\mathcal{X}_{1}</math> and an output alphabet <math>\mathcal{Y}_{1}</math>. Idem for <math>p_{2}</math>. 
We define the product channel <math>p_{1}\times p_2</math> as 
<math>\forall (x_{1}, x_{2}) \in (\mathcal{X}_{1}, \mathcal{X}_{2}),\;(y_{1}, y_{2}) \in (\mathcal{Y}_{1}, \mathcal{Y}_{2}),\; (p_{1}\times p_{2})((y_{1}, y_{2}) | (x_{1},x_{2}))=p_{1}(y_{1}|x_{1})p_{2}(y_{2}|x_{2})</math>

This theorem states:
<math display="block"> C(p_{1}\times p_{2}) = C(p_{1}) + C(p_{2})</math>

{{Proof|

We first show that <math> C(p_{1}\times p_{2}) \geq C(p_{1}) + C(p_{2}) </math>.

Let <math>X_1</math> and <math>X_2</math> be two independent random variables. Let <math>Y_1</math> be a random variable corresponding to the output of <math>X_1</math> through the channel <math>p_{1}</math>, and <math>Y_2</math> for <math>X_2</math> through <math>p_2</math>.

By definition <math>C(p_{1}\times p_{2}) = \sup_{p_{X_{1},X_{2}}}(I(X_{1},X_{2} : Y_{1},Y_{2}))</math>.

Since <math>X_1</math> and <math>X_2</math> are independent, as well as <math>p_1</math> and <math>p_2</math>,  <math>(X_1,Y_1)</math> is independent of <math>(X_2,Y_2)</math>. We can apply the following property of [[mutual information]]: <math>I(X_1,X_2 : Y_1, Y_2) = I(X_1:Y_1) + I(X_2:Y_2)</math>

For now we only need to find a distribution <math>p_{X_1,X_2}</math> such that <math>I(X_1,X_2 : Y_1,Y_2) \geq I(X_1 : Y_1) + I(X_2 : Y_2)</math>. In fact, <math>\pi_1</math> and <math>\pi_2</math>, two probability distributions for <math>X_1</math> and <math>X_2</math> achieving <math>C(p_1)</math> and <math>C(p_2)</math>, suffice:
:<math>C(p_{1}\times p_{2}) \geq I(X_1, X_2 : Y_1, Y_2) = I(X_1:Y_1) + I(X_2:Y_2) = C(p_1) + C(p_2)</math>
ie. <math>C(p_{1}\times p_{2}) \geq C(p_1) + C(p_2)</math>

Now let us show that <math> C(p_{1}\times p_{2}) \leq C(p_{1}) + C(p_{2}) </math>.

Let <math>\pi_{12}</math> be some distribution for the channel <math>p_{1}\times p_{2}</math> defining <math>(X_1, X_2)</math> and the corresponding output <math>(Y_1, Y_2)</math>. Let <math>\mathcal{X}_1</math> be the alphabet of <math>X_1</math>, <math>\mathcal{Y}_1</math> for <math>Y_1</math>, and analogously <math>\mathcal{X}_2</math> and <math>\mathcal{Y}_2</math>.

By definition of mutual information, we have

<math>
\begin{align}
I(X_1, X_2 : Y_1, Y_2) &= H(Y_1, Y_2) - H(Y_1, Y_2 | X_1, X_2)\\
&\leq H(Y_1) + H(Y_2) - H(Y_1, Y_2 | X_1, X_2)
\end{align}
</math>

Let us rewrite the last term of [[Entropy (information theory)|entropy]].

<math>H(Y_1,Y_2|X_1,X_2) = \sum_{(x_1, x_2) \in \mathcal{X}_1\times \mathcal{X}_2}\mathbb{P}(X_{1}, X_{2} = x_{1}, x_{2})H(Y_{1}, Y_{2} | X_{1}, X_{2} = x_{1}, x_{2})
</math>

By definition of the product channel, <math>\mathbb{P}(Y_{1},Y_{2}=y_{1},y_{2}|X_{1},X_{2}=x_{1},x_{2})=\mathbb{P}(Y_{1}=y_{1}|X_{1}=x_{1})\mathbb{P}(Y_{2}=y_{2}|X_{2}=x_{2})</math>. 
For a given pair <math>(x_1, x_2)</math>, we can rewrite <math>H(Y_1,Y_2|X_1,X_2=x_1,x_2)</math> as:

<math>
\begin{align}
H(Y_1, Y_2 | X_1, X_2 = x_1,x_2) &= \sum_{(y_1, y_2) \in \mathcal{Y}_1\times \mathcal{Y}_2}\mathbb{P}(Y_1, Y_2 = y_1, y_2 | X_1, X_2 = x_1, x_2)\log(\mathbb{P}(Y_1, Y_2 = y_1, y_2 | X_1, X_2 = x_1, x_2)) \\

&= \sum_{(y_1, y_2) \in \mathcal{Y}_1\times \mathcal{Y}_2}\mathbb{P}(Y_1, Y_2 = y_1, y_2 | X_1, X_2 = x_1, x_2)[\log(\mathbb{P}(Y_1 = y_1 | X_1 = x_1)) + \log(\mathbb{P}(Y_2 = y_2 | X_2 = x_2))] \\

&=H(Y_{1}|X_{1}=x_1)+H(Y_{2}|X_{2}=x_2)
\end{align}
</math>

By summing this equality over all <math>(x_1, x_2)</math>, we obtain 
<math>H(Y_1,Y_2|X_1,X_2)=H(Y_1|X_1)+H(Y_2|X_2)</math>.

We can now give an upper bound over mutual information:

<math> 
\begin{align}
I(X_{1},X_{2}:Y_{1},Y_{2})&\leq H(Y_{1})+H(Y_{2})-H(Y_{1}|X_{1})-H(Y_{2}|X_{2})\\
&=I(X_{1}:Y_{1})+I(X_{2}:Y_{2})
\end{align}
</math>

This relation is preserved at the supremum. Therefore
:<math>C(p_{1}\times p_{2}) \leq C(p_1)+C(p_2)</math>

Combining the two inequalities we proved, we obtain the result of the theorem:
:<math>C(p_{1}\times p_{2})=C(p_{1})+C(p_{2})</math>

}}