Editing Information theory (section)

===Entropy of an information source===
Based on the [[probability mass function]] of each source symbol to be communicated, the Shannon [[Entropy (information theory)|entropy]] {{math|''H''}}, in units of bits (per symbol), is given by
:<math>H = - \sum_{i} p_i \log_2 (p_i)</math>
where {{math|''p<sub>i</sub>''}} is the probability of occurrence of the {{math|''i''}}-th possible value of the source symbol. This equation gives the entropy in the units of "bits" (per symbol) because it uses a logarithm of base 2, and this base-2 measure of entropy has sometimes been called the [[Shannon (unit)|shannon]] in his honor. Entropy is also commonly computed using the natural logarithm (base {{mvar|[[E (mathematical constant)|e]]}}, where {{mvar|e}} is Euler's number), which produces a measurement of entropy in nats per symbol and sometimes simplifies the analysis by avoiding the need to include extra constants in the formulas. Other bases are also possible, but less commonly used. For example, a logarithm of base {{nowrap|1=2<sup>8</sup> = 256}} will produce a measurement in [[byte]]s per symbol, and a logarithm of base 10 will produce a measurement in decimal digits (or [[Hartley (unit)|hartleys]]) per symbol.

Intuitively, the entropy {{math|''H<sub>X</sub>''}} of a discrete random variable {{math|''X''}} is a measure of the amount of ''uncertainty'' associated with the value of {{math|''X''}} when only its distribution is known.

The entropy of a source that emits a sequence of {{math|''N''}} symbols that are [[independent and identically distributed]] (iid) is {{math|''N'' ⋅ ''H''}} bits (per message of {{math|''N''}} symbols). If the source data symbols are identically distributed but not independent, the entropy of a message of length {{math|''N''}} will be less than {{math|''N'' ⋅ ''H''}}.

[[File:Binary entropy plot.svg|thumbnail|right|200px|The entropy of a [[Bernoulli trial]] as a function of success probability, often called the {{em|[[binary entropy function]]}}, {{math|''H''<sub>b</sub>(''p'')}}. The entropy is maximized at 1 bit per trial when the two possible outcomes are equally probable, as in an unbiased coin toss.]]

If one transmits 1000 bits (0s and 1s), and the value of each of these bits is known to the receiver (has a specific value with certainty) ahead of transmission, it is clear that no information is transmitted. If, however, each bit is independently equally likely to be 0 or 1, 1000 shannons of information (more often called bits) have been transmitted. Between these two extremes, information can be quantified as follows. If <math>\mathbb{X}</math> is the set of all messages {{math|{{mset|''x''<sub>1</sub>, ..., ''x''<sub>''n''</sub>}}}} that {{math|''X''}} could be, and {{math|''p''(''x'')}} is the probability of some <math>x \in \mathbb X</math>, then the entropy, {{math|''H''}}, of {{math|''X''}} is defined:{{sfn|Reza|1994}}

:<math> H(X) = \mathbb{E}_{X} [I(x)] = -\sum_{x \in \mathbb{X}} p(x) \log p(x).</math>

(Here, {{math|''I''(''x'')}} is the [[self-information]], which is the entropy contribution of an individual message, and <math>\mathbb{E}_X</math> is the [[expected value]].) A property of entropy is that it is maximized when all the messages in the message space are equiprobable {{math|1=''p''(''x'') = 1/''n''}}; i.e., most unpredictable, in which case {{math|1=''H''(''X'') = log ''n''}}.

The special case of information entropy for a random variable with two outcomes is the binary entropy function, usually taken to the logarithmic base 2, thus having the [[Shannon (unit)|shannon]] (Sh) as unit:

:<math>H_{\mathrm{b}}(p) = - p \log_2 p - (1-p)\log_2 (1-p).</math>