Editing Shannon–Fano coding (section)

==Shannon's code: predefined word lengths==

{{Main|Shannon coding}}

===Shannon's algorithm===

Shannon's method starts by deciding on the lengths of all the codewords, then picks a prefix code with those word lengths.

Given a source with probabilities <math>p_1, p_2, \dots, p_n</math> the desired codeword lengths are <math>l_i = \lceil -\log_2 p_i \rceil</math>. Here, <math>\lceil x \rceil</math> is the [[Floor and ceiling functions|ceiling function]], meaning the smallest integer greater than or equal to <math>x</math>.

Once the codeword lengths have been determined, we must choose the codewords themselves. One method is to pick codewords in order from most probable to least probable symbols, picking each codeword to be the lexicographically first word of the correct length that maintains the prefix-free property.

A second method makes use of cumulative probabilities. First, the probabilities are written in decreasing order <math>p_1 \geq p_2 \geq \cdots \geq p_n</math>. Then, the cumulative probabilities are defined as
:<math>c_1 = 0, \qquad c_i = \sum_{j=1}^{i-1} p_j \text{ for }i \geq 2 , </math>
so <math>c_1 = 0, c_2 = p_1, c_3 = p_1 + p_2</math> and so on.
The codeword for symbol <math>i</math> is chosen to be the first <math>l_i</math> binary digits in the [[binary number|binary expansion]] of <math>c_i</math>.

===Example===

This example shows the construction of a Shannon–Fano code for a small alphabet. There 5 different source symbols. Suppose 39 total symbols have been observed with the following frequencies, from which we can estimate the symbol probabilities.

:{| class="wikitable" style="text-align: center;"
! Symbol
! A
! B
! C
! D
! E
|-
! Count
| 15
| 7
| 6
| 6
| 5
|-
! Probabilities
| 0.385
| 0.179
| 0.154
| 0.154
| 0.128
|}

This source has [[Entropy (information theory)|entropy]] <math>H(X) = 2.186</math> bits.

For the Shannon–Fano code, we need to calculate the desired word lengths <math>l_i = \lceil -\log_2 p_i \rceil</math>.

:{| class="wikitable" style="text-align: center;"
! Symbol
! A
! B
! C
! D
! E
|-
! Probabilities
| 0.385
| 0.179
| 0.154
| 0.154
| 0.128
|-
! <math>-\log_2 p_i</math>
| 1.379
| 2.480
| 2.700
| 2.700
| 2.963
|-
! Word lengths <math>\lceil -\log_2 p_i \rceil</math>
| 2
| 3
| 3
| 3
| 3
|}

We can pick codewords in order, choosing the lexicographically first word of the correct length that maintains the prefix-free property. Clearly A gets the codeword 00. To maintain the prefix-free property, B's codeword may not start 00, so the lexicographically first available word of length 3 is 010. Continuing like this, we get the following code:

:{| class="wikitable" style="text-align: center;"
! Symbol
! A
! B
! C
! D
! E
|-
! Probabilities
| 0.385
| 0.179
| 0.154
| 0.154
| 0.128
|-
! Word lengths <math>\lceil -\log_2 p_i \rceil</math>
| 2
| 3
| 3
| 3
| 3
|-
! Codewords
| 00
| 010
| 011
| 100
| 101
|}

Alternatively, we can use the cumulative probability method.

:{| class="wikitable" style="text-align: center;"
! Symbol
! A
! B
! C
! D
! E
|-
! Probabilities
| 0.385
| 0.179
| 0.154
| 0.154
| 0.128
|-
! Cumulative probabilities
| 0.000
| 0.385
| 0.564
| 0.718
| 0.872
|-
! ...in binary
| 0.00000
| 0.01100
| 0.10010
| 0.10110
| 0.11011
|-
! Word lengths <math>\lceil -\log_2 p_i \rceil</math>
| 2
| 3
| 3
| 3
| 3
|-
! Codewords
| 00
| 011
| 100
| 101
| 110
|}

Note that although the codewords under the two methods are different, the word lengths are the same. We have lengths of 2 bits for A, and 3 bits for B, C, D and E, giving an average length of

:<math display="block">\frac{2\,\text{bits}\cdot(15) + 3\,\text{bits} \cdot (7+6+6+5)}{39\, \text{symbols}} \approx 2.62\,\text{bits per symbol,}</math>

which is within one bit of the entropy.

===Expected word length===

For Shannon's method, the word lengths satisfy

:<math>l_i = \lceil -\log_2 p_i \rceil \leq -\log_2 p_i + 1 .</math>

Hence the expected word length satisfies
:<math display="block">\mathbb E L = \sum_{i=1}^n p_il_i \leq \sum_{i=1}^n p_i (-\log_2 p_i + 1) = -\sum_{i=1}^n p_i \log_2 p_i + \sum_{i=1}^n p_i = H(X) + 1.</math>
Here, <math>H(X) = - \textstyle\sum_{i=1}^n p_i \log_2 p_i</math> is the [[Entropy (information theory)|entropy]], and [[Shannon's source coding theorem]] says that any code must have an average length of at least <math>H(X)</math>. Hence we see that the Shannon–Fano code is always within one bit of the optimal expected word length.