Editing Entropy (information theory) (section)

===Entropy of a sequence===
There are a number of entropy-related concepts that mathematically quantify information content of a sequence or message:
* the '''[[self-information]]''' of an individual message or symbol taken from a given probability distribution (message or sequence seen as an individual event),
* the '''[[joint entropy]]''' of the symbols forming the message or sequence (seen as a set of events),
* the '''[[entropy rate]]''' of a [[stochastic process]] (message or sequence is seen as a succession of events).
(The "rate of self-information" can also be defined for a particular sequence of messages or symbols generated by a given stochastic process: this will always be equal to the entropy rate in the case of a [[stationary process]].) Other [[quantities of information]] are also used to compare or relate different sources of information.

It is important not to confuse the above concepts. Often it is only clear from context which one is meant. For example, when someone says that the "entropy" of the English language is about 1 bit per character, they are actually modeling the English language as a stochastic process and talking about its entropy ''rate''. Shannon himself used the term in this way.

If very large blocks are used, the estimate of per-character entropy rate may become artificially low because the probability distribution of the sequence is not known exactly; it is only an estimate. If one considers the text of every book ever published as a sequence, with each symbol being the text of a complete book, and if there are {{math|''N''}} published books, and each book is only published once, the estimate of the probability of each book is {{math|1/''N''}}, and the entropy (in bits) is {{math|−log<sub>2</sub>(1/''N'') {{=}} log<sub>2</sub>(''N'')}}. As a practical code, this corresponds to assigning each book a [[ISBN|unique identifier]] and using it in place of the text of the book whenever one wants to refer to the book. This is enormously useful for talking about books, but it is not so useful for characterizing the information content of an individual book, or of language in general: it is not possible to reconstruct the book from its identifier without knowing the probability distribution, that is, the complete text of all the books. The key idea is that the complexity of the probabilistic model must be considered. [[Kolmogorov complexity]] is a theoretical generalization of this idea that allows the consideration of the information content of a sequence independent of any particular probability model; it considers the shortest [[Computer program|program]] for a [[universal computer]] that outputs the sequence. A code that achieves the entropy rate of a sequence for a given model, plus the codebook (i.e. the probabilistic model), is one such program, but it may not be the shortest.

The Fibonacci sequence is 1, 1, 2, 3, 5, 8, 13, .... treating the sequence as a message and each number as a symbol, there are almost as many symbols as there are characters in the message, giving an entropy of approximately {{math|log<sub>2</sub>(''n'')}}. The first 128 symbols of the Fibonacci sequence has an entropy of approximately 7 bits/symbol, but the sequence can be expressed using a formula [{{math|F(''n'') {{=}} F(''n''−1) + F(''n''−2)}} for {{math|''n'' {{=}} 3, 4, 5, ...}}, {{math|F(1) {{=}}1}}, {{math|F(2) {{=}} 1}}] and this formula has a much lower entropy and applies to any length of the Fibonacci sequence.