Editing Information content (section)

== Properties ==
{{Expand section|date=October 2018}}

=== Monotonically decreasing function of probability ===
For a given [[probability space]], the measurement of rarer [[event (probability theory)|event]]s are intuitively more "surprising", and yield more information content, than more common values. Thus, self-information is a [[Monotonic function|strictly decreasing monotonic function]] of the probability, or sometimes called an "antitonic" function.

While standard probabilities are represented by real numbers in the interval <math>[0, 1]</math>, self-informations are represented by [[extended real number]]s in the interval <math>[0, \infty]</math>. In particular, we have the following, for any choice of logarithmic base:

* If a particular event has a 100% probability of occurring, then its self-information is <math>-\log(1) = 0</math>: its occurrence is "perfectly non-surprising" and yields no information.
* If a particular event has a 0% probability of occurring, then its self-information is <math>-\log(0) = \infty</math>: its occurrence is "infinitely surprising".

From this, we can get a few general properties:

* Intuitively, more information is gained from observing an unexpected event—it is "surprising". 
** For example, if there is a [[wikt:one in a million|one-in-a-million]] chance of Alice winning the [[lottery]], her friend Bob will gain significantly more information from learning that she [[Winning the lottery|won]] than that she lost on a given day. (See also ''[[Lottery mathematics]]''.)
* This establishes an implicit relationship between the self-information of a [[random variable]] and its [[variance]].

=== Relationship to log-odds ===
The Shannon information is closely related to the [[log-odds]]. In particular, given some event <math>x</math>, suppose that <math>p(x)</math> is the probability of <math>x</math> occurring, and that <math>p(\lnot x) = 1-p(x)</math> is the probability of <math>x</math> not occurring. Then we have the following definition of the log-odds:
<math display="block">\text{log-odds}(x) = \log\left(\frac{p(x)}{p(\lnot x)}\right)</math>

This can be expressed as a difference of two Shannon informations:
<math display="block">\text{log-odds}(x) = \mathrm{I}(\lnot x) - \mathrm{I}(x)</math>

In other words, the log-odds can be interpreted as the level of surprise when the event ''doesn't'' happen, minus the level of surprise when the event ''does'' happen.

=== Additivity of independent events ===
The information content of two [[independent events]] is the sum of each event's information content. This property is known as [[Additive map|additivity]] in mathematics, and [[sigma additivity]] in particular in [[Measure (mathematics)|measure]] and probability theory. Consider two [[independent random variables]] <math display="inline">X,\, Y</math> with [[probability mass function]]s <math>p_X(x)</math> and <math>p_Y(y)</math> respectively. The [[joint probability mass function]] is

<math display="block"> p_{X, Y}\!\left(x, y\right) = \Pr(X = x,\, Y = y) 
 = p_X\!(x)\,p_Y\!(y) 
</math>

because <math display="inline">X</math> and <math display="inline">Y</math> are [[Independence (probability theory)|independent]]. The information content of the [[Outcome (probability)|outcome]] <math> (X, Y) = (x, y)</math> is<math display="block"> \begin{align}
\operatorname{I}_{X,Y}(x, y) &= -\log_2\left[p_{X,Y}(x, y)\right]
 = -\log_2 \left[p_X\!(x)p_Y\!(y)\right] \\[5pt]
 &= -\log_2 \left[p_X{(x)}\right] -\log_2 \left[p_Y{(y)}\right] \\[5pt]
 &= \operatorname{I}_X(x) + \operatorname{I}_Y(y)
\end{align}
</math>
See ''{{Section link||Two independent, identically distributed dice|nopage=y}}'' below for an example.

The corresponding property for [[likelihood]]s is that the [[log-likelihood]] of independent events is the sum of the log-likelihoods of each event. Interpreting log-likelihood as "support" or negative surprisal (the degree to which an event supports a given model: a model is supported by an event to the extent that the event is unsurprising, given the model), this states that independent events add support: the information that the two events together provide for statistical inference is the sum of their independent information.