Editing Information content (section)

== Definition ==
[[Claude Shannon]]'s definition of self-information was chosen to meet several axioms:

# An event with probability 100% is perfectly unsurprising and yields no information.
# The less probable an event is, the more surprising it is and the more information it yields.
# If two independent events are measured separately, the total amount of information is the sum of the self-informations of the individual events.

The detailed derivation is below, but it can be shown that there is a unique function of probability that meets these three axioms, up to a multiplicative scaling factor. Broadly, given a real number <math>b>1</math> and an [[Event (probability theory)|event]] <math>x</math> with [[probability]] <math>P</math>, the information content is defined as follows:
<math display="block">\mathrm{I}(x) := - \log_b{\left[\Pr{\left(x\right)}\right]} = -\log_b{\left(P\right)}. </math>

The base ''b'' corresponds to the scaling factor above. Different choices of ''b'' correspond to different units of information: when {{nowrap|1=''b'' = 2}}, the unit is the [[Shannon (unit)|shannon]] (symbol Sh), often called a 'bit'; when {{nowrap|1=''b'' = [[Euler's number|e]]}}, the unit is the [[Nat (unit)|natural unit of information]] (symbol nat); and when {{nowrap|1=''b'' = 10}}, the unit is the [[Hartley (unit)|hartley]] (symbol Hart).

Formally, given a discrete random variable <math>X</math> with [[probability mass function]] <math>p_{X}{\left(x\right)}</math>, the self-information of measuring <math>X</math> as [[Outcome (probability)|outcome]] <math>x</math> is defined as<ref name=":0">{{Cite book|title=Quantum Computing Explained|last=McMahon|first=David M.|publisher=Wiley-Interscience|year=2008|isbn=9780470181386 |location=Hoboken, NJ|oclc=608622533}}</ref>
<math display="block">\operatorname I_X(x) := 
 - \log{\left[p_{X}{\left(x\right)}\right]}
 = \log{\left(\frac{1}{p_{X}{\left(x\right)}}\right)}. </math>

The use of the notation <math>I_X(x)</math> for self-information above is not universal. Since the notation <math>I(X;Y)</math> is also often used for the related quantity of [[mutual information]], many authors use a lowercase <math>h_X(x)</math> for self-entropy instead, mirroring the use of the capital <math>H(X)</math> for the entropy.