Editing Entropy (information theory) (section)

==Definition==
Named after [[H-theorem|Boltzmann's Η-theorem]], Shannon defined the entropy {{math|&Eta;}} (Greek capital letter [[eta]]) of a [[discrete random variable]] <math display="inline">X</math>, which takes values in the set <math>\mathcal{X}</math> and is distributed according to <math>p: \mathcal{X} \to [0, 1]</math> such that <math>p(x) := \mathbb{P}[X = x]</math>:

<math display="block">\Eta(X) = \mathbb{E}[\operatorname{I}(X)] = \mathbb{E}[-\log p(X)].</math>

Here <math>\mathbb{E}</math> is the [[expected value|expected value operator]], and {{math|I}} is the [[information content]] of {{math|''X''}}.<ref>{{cite book|author=Borda, Monica|title=Fundamentals in Information Theory and Coding|publisher=Springer|year=2011|isbn=978-3-642-20346-6|url=https://books.google.com/books?id=Lyte2yl1SPAC&pg=PA11}}</ref>{{rp|p=11}}<ref>{{cite book|author1=Han, Te Sun |author2=Kobayashi, Kingo |title=Mathematics of Information and Coding|publisher=American Mathematical Society|year=2002|isbn=978-0-8218-4256-0|url=https://books.google.com/books?id=VpRESN24Zj0C&pg=PA19}}</ref>{{rp|pp=19–20}}
<math>\operatorname{I}(X)</math> is itself a random variable.

The entropy can explicitly be written as:
<math display="block">\Eta(X) = -\sum_{x \in \mathcal{X}} p(x)\log_b p(x) ,</math>
where {{math|''b''}} is the [[base of a logarithm|base of the logarithm]] used. Common values of {{math|''b''}} are 2, [[e (mathematical constant)|Euler's number {{math|''e''}}]], and 10, and the corresponding units of entropy are the [[bit]]s for {{math|''b'' {{=}} 2}}, [[Nat (unit)|nats]] for {{math|''b'' {{=}} ''e''}}, and [[ban (unit)|ban]]s for {{math|''b'' {{=}} 10}}.<ref>Schneider, T.D, [http://alum.mit.edu/www/toms/paper/primer/primer.pdf Information theory primer with an appendix on logarithms]{{Dead link|date=August 2023 |bot=InternetArchiveBot |fix-attempted=yes }}, National Cancer Institute, 14 April 2007.</ref>

In the case of <math>p(x) = 0</math> for some <math>x \in \mathcal{X}</math>, the value of the corresponding summand {{math|0 log<sub>''b''</sub>(0)}} is taken to be {{math|0}}, which is consistent with the [[limit of a function|limit]]:<ref name="cover1991">{{cite book |author1=Thomas M. Cover |title=Elements of Information Theory |author2=Joy A. Thomas |date=1991 |publisher=Wiley |isbn=978-0-471-24195-9 |location=Hoboken, New Jersey}}</ref>{{rp|p=13}}
<math display="block">\lim_{p \to 0^+} p \log (p) = 0.</math>

One may also define the [[conditional entropy]] of two variables <math>X</math> and <math>Y</math> taking values from sets <math>\mathcal{X}</math> and <math>\mathcal{Y}</math> respectively, as:<ref name=cover1991/>{{rp|p=16}}
<math display="block"> \Eta(X|Y)=-\sum_{x,y \in \mathcal{X} \times \mathcal{Y}} p_{X,Y}(x,y)\log\frac{p_{X,Y}(x,y)}{p_Y(y)} ,</math>
where <math>p_{X,Y}(x,y) := \mathbb{P}[X=x,Y=y]</math> and <math>p_Y(y) = \mathbb{P}[Y = y]</math>. This quantity should be understood as the remaining randomness in the random variable <math>X</math> given the random variable <math>Y</math>.

=== Measure theory ===

Entropy can be formally defined in the language of [[measure theory]] as follows:<ref>{{nlab|id=entropy|title=Entropy}}</ref> Let <math>(X, \Sigma, \mu)</math> be a [[probability space]]. Let <math>A \in \Sigma</math> be an [[event (probability theory)|event]]. The [[surprisal]] of <math>A</math> is
<math display="block"> \sigma_\mu(A) = -\ln \mu(A) .</math>

The ''expected'' surprisal of <math>A</math> is
<math display="block"> h_\mu(A) = \mu(A) \sigma_\mu(A) .</math>

A <math>\mu</math>-almost [[partition of a set|partition]] is a [[set family]] <math>P \subseteq \mathcal{P}(X)</math> such that <math>\mu(\mathop{\cup} P) = 1</math> and <math>\mu(A \cap B) = 0</math> for all distinct <math>A, B \in P</math>. (This is a relaxation of the usual conditions for a partition.) The entropy of <math>P</math> is
<math display="block">  \Eta_\mu(P) = \sum_{A \in P} h_\mu(A) .</math>

Let <math>M</math> be a [[sigma-algebra]] on <math>X</math>. The entropy of <math>M</math> is
<math display="block"> \Eta_\mu(M) = \sup_{P \subseteq M} \Eta_\mu(P) .</math>
Finally, the entropy of the probability space is <math>\Eta_\mu(\Sigma)</math>, that is, the entropy with respect to <math>\mu</math> of the sigma-algebra of ''all'' measurable subsets of <math>X</math>.

Recent studies on layered dynamical systems have introduced the concept of symbolic conditional entropy, further extending classical entropy measures to more abstract informational structures.<ref>{{cite web |last=Alpay |first=F. |year=2025 |title=Symbolic Conditional Entropy in Layered Dynamical Systems |publisher=Zenodo |url=https://doi.org/10.5281/zenodo.15354902 |doi=10.5281/zenodo.15354902 |access-date=7 May 2025}}</ref>