Editing Information content (section)

==Derivation==
By definition, information is transferred from an originating entity possessing the information to a receiving entity only when the receiver had not known the information [[A priori knowledge|a priori]].  If the receiving entity had previously known the content of a message with certainty before receiving the message, the amount of information of the message received is zero. Only when the advance knowledge of the content of the message by the receiver is less than 100% certain does the message actually convey information.

For example, quoting a character (the Hippy Dippy Weatherman) of comedian [[George Carlin]]:<blockquote>''Weather forecast for tonight: dark.'' 

''Continued dark overnight, with widely scattered light by morning.''<ref>{{Cite web|title=A quote by George Carlin |url=https://www.goodreads.com/quotes/94336-weather-forecast-for-tonight-dark-continued-dark-overnight-with-widely|access-date=2021-04-01|website=www.goodreads.com}}</ref> </blockquote>Assuming that one does not reside near the [[Polar regions of Earth|polar regions]], the amount of information conveyed in that forecast is zero because it is known, in advance of receiving the forecast, that darkness always comes with the night.

Accordingly, the amount of self-information contained in a message conveying content informing an occurrence of [[event (probability theory)|event]], <math>\omega_n</math>, depends only on the probability of that event.

<math display="block">\operatorname I(\omega_n) = f(\operatorname P(\omega_n)) </math>
for some function <math>f(\cdot)</math> to be determined below.  If <math>\operatorname P(\omega_n) = 1</math>, then <math>\operatorname I(\omega_n) = 0</math>.  If <math>\operatorname P(\omega_n) < 1</math>, then <math>\operatorname I(\omega_n) > 0</math>.

Further, by definition, the [[Measure (mathematics)|measure]] of self-information is nonnegative and additive. If a message informing of event <math>C</math> is the '''intersection''' of two [[statistical independence|independent]] events <math>A</math> and <math>B</math>, then the information of event <math>C</math> occurring is that of the compound message of both independent events <math>A</math> and <math>B</math> occurring.  The quantity of information of compound message <math>C</math> would be expected to equal the '''sum''' of the amounts of information of the individual component messages <math>A</math> and <math>B</math> respectively:
<math display="block">\operatorname I(C) = \operatorname I(A \cap B) = \operatorname I(A) + \operatorname I(B).</math>

Because of the independence of events <math>A</math> and <math>B</math>, the probability of event <math>C</math> is
<math display="block">\operatorname P(C) = \operatorname P(A \cap B) = \operatorname P(A) \cdot \operatorname P(B).</math>

However, applying function <math>f(\cdot)</math> results in
<math display="block">\begin{align}
   \operatorname I(C) & = \operatorname I(A) + \operatorname I(B) \\
f(\operatorname P(C)) & = f(\operatorname P(A)) + f(\operatorname P(B)) \\
                      & = f\big(\operatorname P(A) \cdot \operatorname P(B)\big) \\
\end{align}</math>

Thanks to work on [[Cauchy's functional equation]], the only monotone functions <math>f(\cdot)</math> having the property such that
<math display="block">f(x \cdot y) = f(x) + f(y)</math>
are the [[logarithm]] functions <math>\log_b(x)</math>.  The only operational difference between logarithms of different bases is that of different scaling constants, so we may assume

<math display="block">f(x) = K \log(x)</math>

where <math>\log</math> is the [[natural logarithm]].  Since the probabilities of events are always between 0 and 1 and the information associated with these events must be nonnegative, that requires that <math>K<0</math>.

Taking into account these properties, the self-information <math>\operatorname I(\omega_n)</math> associated with outcome <math>\omega_n</math> with probability <math>\operatorname P(\omega_n)</math> is defined as:
<math display="block">\operatorname I(\omega_n) = -\log(\operatorname P(\omega_n)) = \log \left(\frac{1}{\operatorname P(\omega_n)} \right) </math>

The smaller the probability of event <math>\omega_n</math>, the larger the quantity of self-information associated with the message that the event indeed occurred.  If the above logarithm is base 2, the unit of <math> I(\omega_n)</math> is [[Shannon (unit)|shannon]].  This is the most common practice.  When using the [[natural logarithm]] of base <math> e</math>, the unit will be the [[Nat (unit)|nat]]. For the base 10 logarithm, the unit of information is the [[Hartley (unit)|hartley]].

As a quick illustration, the information content associated with an outcome of 4 heads (or any specific outcome) in 4 consecutive tosses of a coin would be 4 shannons (probability 1/16), and the information content associated with getting a result other than the one specified would be ~0.09 shannons (probability 15/16). See above for detailed examples.