Editing Entropy (information theory) (section)

==Further properties==
The Shannon entropy satisfies the following properties, for some of which it is useful to interpret entropy as the expected amount of information learned (or uncertainty eliminated) by revealing the value of a random variable {{math|''X''}}:

* Adding or removing an event with probability zero does not contribute to the entropy: <math display="block">\Eta_{n+1}(p_1,\ldots,p_n,0) = \Eta_n(p_1,\ldots,p_n).</math>
* The maximal entropy of an event with ''n'' different outcomes is {{math|log<sub>''b''</sub>(''n'')}}: it is attained by the uniform probability distribution. That is, uncertainty is maximal when all possible events are equiprobable:<ref name="cover1991" />{{rp|p=29}} <math display="block">\Eta(p_1,\dots,p_n) \leq \log_b n.</math>
* The entropy or the amount of information revealed by evaluating {{math|(''X'',''Y'')}} (that is, evaluating {{math|''X''}} and {{math|''Y''}} simultaneously) is equal to the information revealed by conducting two consecutive experiments: first evaluating the value of {{math|''Y''}}, then revealing the value of {{math|''X''}} given that you know the value of {{math|''Y''}}. This may be written as:<ref name=cover1991/>{{rp|p=16}} <math display="block"> \Eta(X,Y)=\Eta(X|Y)+\Eta(Y)=\Eta(Y|X)+\Eta(X).</math>
* If <math>Y=f(X)</math> where <math>f</math> is a function, then <math>\Eta(f(X)|X) = 0</math>. Applying the previous formula to <math>\Eta(X,f(X))</math> yields <math display="block"> \Eta(X)+\Eta(f(X)|X)=\Eta(f(X))+\Eta(X|f(X)),</math> so <math>\Eta(f(X)) \le \Eta(X)</math>, the entropy of a variable can only decrease when the latter is passed through a function.
* If {{math|''X''}} and {{math|''Y''}} are two independent random variables, then knowing the value of {{math|''Y''}} doesn't influence our knowledge of the value of {{math|''X''}} (since the two don't influence each other by independence): <math display="block"> \Eta(X|Y)=\Eta(X).</math>
* More generally, for any random variables {{math|''X''}} and {{math|''Y''}}, we have<ref name=cover1991/>{{rp|p=29}} <math display="block"> \Eta(X|Y)\leq \Eta(X).</math>
* The entropy of two simultaneous events is no more than the sum of the entropies of each individual event i.e., <math> \Eta(X,Y)\leq \Eta(X)+\Eta(Y)</math>, with equality if and only if the two events are independent.<ref name=cover1991/>{{rp|p=28}}
* The entropy <math>\Eta(p)</math> is [[Concave function|concave]] in the probability mass function <math>p</math>, i.e.<ref name=cover1991/>{{rp|p=30}} <math display="block">\Eta(\lambda p_1 + (1-\lambda) p_2) \ge \lambda \Eta(p_1) + (1-\lambda) \Eta(p_2)</math> for all probability mass functions <math>p_1,p_2</math> and <math> 0 \le \lambda \le 1</math>.<ref name=cover1991 />{{rp|p=32}}
** Accordingly, the [[negative entropy]] (negentropy) function is convex, and its [[convex conjugate]] is [[LogSumExp]].