Editing Entropy (information theory) (section)

{{short description|Expected amount of information needed to specify the output of a stochastic data source}}
{{other uses|Entropy (disambiguation)}}
{{More citations needed|date=February 2019}}
{{Use dmy dates|date=October 2023}}
{{Use American English|date=December 2024}}

{{Information theory}}

In [[information theory]], the '''entropy''' of a [[random variable]] quantifies the average level of uncertainty or information associated with the variable's potential states or possible outcomes. This measures the expected amount of information needed to describe the state of the variable, considering the distribution of probabilities across all potential states. Given a discrete random variable <math>X</math>, which may be any member <math>x</math> within the set <math>\mathcal{X}</math> and is distributed according to <math>p\colon \mathcal{X}\to[0, 1]</math>, the entropy is
<math display="block">\Eta(X) := -\sum_{x \in \mathcal{X}} p(x) \log p(x),</math>
where <math>\Sigma</math> denotes the sum over the variable's possible values.<ref group=Note name=Note01/> The choice of base for <math>\log</math>, the [[logarithm]], varies for different applications. Base 2 gives the unit of [[bit]]s (or "[[shannon (unit)|shannon]]s"), while base [[Euler's number|''e'']] gives "natural units" [[nat (unit)|nat]], and base 10 gives units of "dits", "bans", or "[[Hartley (unit)|hartleys]]". An equivalent definition of entropy is the [[expected value]] of the [[self-information]] of a variable.<ref name="pathriaBook">{{cite book|last1=Pathria|first1=R. K.|url=https://books.google.com/books?id=KdbJJAXQ-RsC|title=Statistical Mechanics|last2=Beale|first2=Paul|date=2011|publisher=Academic Press|isbn=978-0123821881|edition=Third|page=51}}</ref>

[[File:Entropy flip 2 coins.jpg|thumb|300px|Two bits of entropy: In the case of two fair coin tosses, the information entropy in bits is the base-2 logarithm of the number of possible outcomes{{px2}}{{mdash}}{{hsp}}with two coins there are four possible outcomes, and two bits of entropy. Generally, information entropy is the average amount of information conveyed by an event, when considering all possible outcomes.]]

The concept of information entropy was introduced by [[Claude Shannon]] in his 1948 paper "[[A Mathematical Theory of Communication]]",<ref name="shannonPaper1">{{cite journal|last=Shannon|first=Claude E.|author-link=Claude Shannon|date=July 1948|title=A Mathematical Theory of Communication|journal=[[Bell System Technical Journal]]|volume=27|issue=3|pages=379–423|doi=10.1002/j.1538-7305.1948.tb01338.x|hdl-access=free|title-link=A Mathematical Theory of Communication|hdl=10338.dmlcz/101429}} ([https://web.archive.org/web/20120615000000*/https://www.alcatel-lucent.com/bstj/vol27-1948/articles/bstj27-3-379.pdf PDF], archived from [http://www.alcatel-lucent.com/bstj/vol27-1948/articles/bstj27-3-379.pdf here] {{Webarchive|url=https://web.archive.org/web/20140620153353/http://www3.alcatel-lucent.com/bstj/vol27-1948/articles/bstj27-3-379.pdf |date=20 June 2014 }})</ref><ref name="shannonPaper2">{{cite journal|last=Shannon|first=Claude E.|author-link=Claude Shannon|date=October 1948|title=A Mathematical Theory of Communication|journal=[[Bell System Technical Journal]]|volume=27|issue=4|pages=623–656|doi=10.1002/j.1538-7305.1948.tb00917.x|hdl-access=free|title-link=A Mathematical Theory of Communication|hdl=11858/00-001M-0000-002C-4317-B}} ([https://web.archive.org/web/20120615000000*/https://www.alcatel-lucent.com/bstj/vol27-1948/articles/bstj27-4-623.pdf PDF], archived from [http://www.alcatel-lucent.com/bstj/vol27-1948/articles/bstj27-4-623.pdf here] {{Webarchive|url=https://web.archive.org/web/20130510074504/http://www.alcatel-lucent.com/bstj/vol27-1948/articles/bstj27-4-623.pdf |date=10 May 2013 }})</ref> and is also referred to as '''Shannon entropy'''. Shannon's theory defines a [[data communication]] system composed of three elements: a source of data, a [[communication channel]], and a receiver. The "fundamental problem of communication" – as expressed by Shannon – is for the receiver to be able to identify what data was generated by the source, based on the signal it receives through the channel.<ref name="shannonPaper1" /><ref name="shannonPaper2" /> Shannon considered various ways to encode, compress, and transmit messages from a data source, and proved in his [[source coding theorem]] that the entropy represents an absolute mathematical limit on how well data from the source can be [[lossless]]ly compressed onto a perfectly noiseless channel. Shannon strengthened this result considerably for noisy channels in his [[noisy-channel coding theorem]].

Entropy in information theory is directly analogous to the [[Entropy (statistical thermodynamics)|entropy]] in [[statistical thermodynamics]]. The analogy results when the values of the random variable designate energies of microstates, so Gibbs's formula for the entropy is formally identical to Shannon's formula. Entropy has relevance to other areas of mathematics such as [[combinatorics]] and [[machine learning]]. The definition can be derived from a set of [[axiom]]s establishing that entropy should be a measure of how informative the average outcome of a variable is. For a continuous random variable, [[differential entropy]] is analogous to entropy. The definition <math>\mathbb{E}[-\log p(X)] </math> generalizes the above.