Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Entropy (information theory)
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Entropy for continuous random variables== ===Differential entropy=== {{Main|Differential entropy}} The Shannon entropy is restricted to random variables taking discrete values. The corresponding formula for a continuous random variable with [[probability density function]] {{math|''f''(''x'')}} with finite or infinite support <math>\mathbb X</math> on the real line is defined by analogy, using the above form of the entropy as an expectation:<ref name=cover1991/>{{rp|p=224}} <math display="block">\Eta(X) = \mathbb{E}[-\log f(X)] = -\int_\mathbb X f(x) \log f(x)\, \mathrm{d}x.</math> This is the differential entropy (or continuous entropy). A precursor of the continuous entropy {{math|''h''[''f'']}} is the expression for the functional {{math|''ฮ''}} in the [[H-theorem]] of Boltzmann. Although the analogy between both functions is suggestive, the following question must be set: is the differential entropy a valid extension of the Shannon discrete entropy? Differential entropy lacks a number of properties that the Shannon discrete entropy has โ it can even be negative โ and corrections have been suggested, notably [[limiting density of discrete points]]. To answer this question, a connection must be established between the two functions: In order to obtain a generally finite measure as the [[bin size]] goes to zero. In the discrete case, the bin size is the (implicit) width of each of the {{math|''n''}} (finite or infinite) bins whose probabilities are denoted by {{math|''p''<sub>''n''</sub>}}. As the continuous domain is generalized, the width must be made explicit. To do this, start with a continuous function {{math|''f''}} discretized into bins of size <math>\Delta</math>. <!-- Figure: Discretizing the function $ f$ into bins of width $ \Delta$ \includegraphics[width=\textwidth]{function-with-bins.eps} --><!-- The original article this figure came from is at http://planetmath.org/shannonsentropy but it is broken there too --> By the mean-value theorem there exists a value {{math|''x''<sub>''i''</sub>}} in each bin such that <math display="block">f(x_i) \Delta = \int_{i\Delta}^{(i+1)\Delta} f(x)\, dx</math> the integral of the function {{math|''f''}} can be approximated (in the Riemannian sense) by <math display="block">\int_{-\infty}^{\infty} f(x)\, dx = \lim_{\Delta \to 0} \sum_{i = -\infty}^{\infty} f(x_i) \Delta ,</math> where this limit and "bin size goes to zero" are equivalent. We will denote <math display="block">\Eta^{\Delta} := - \sum_{i=-\infty}^{\infty} f(x_i) \Delta \log \left( f(x_i) \Delta \right)</math> and expanding the logarithm, we have <math display="block">\Eta^{\Delta} = - \sum_{i=-\infty}^{\infty} f(x_i) \Delta \log (f(x_i)) -\sum_{i=-\infty}^{\infty} f(x_i) \Delta \log (\Delta).</math> As {{math|ฮ โ 0}}, we have <math display="block">\begin{align} \sum_{i=-\infty}^{\infty} f(x_i) \Delta &\to \int_{-\infty}^{\infty} f(x)\, dx = 1 \\ \sum_{i=-\infty}^{\infty} f(x_i) \Delta \log (f(x_i)) &\to \int_{-\infty}^{\infty} f(x) \log f(x)\, dx. \end{align}</math> Note; {{math|log(ฮ) โ โโ}} as {{math|ฮ โ 0}}, requires a special definition of the differential or continuous entropy: <math display="block">h[f] = \lim_{\Delta \to 0} \left(\Eta^{\Delta} + \log \Delta\right) = -\int_{-\infty}^{\infty} f(x) \log f(x)\,dx,</math> which is, as said before, referred to as the differential entropy. This means that the differential entropy ''is not'' a limit of the Shannon entropy for {{math|''n'' โ โ}}. Rather, it differs from the limit of the Shannon entropy by an infinite offset (see also the article on [[information dimension]]). ===Limiting density of discrete points=== {{Main|Limiting density of discrete points}} It turns out as a result that, unlike the Shannon entropy, the differential entropy is ''not'' in general a good measure of uncertainty or information. For example, the differential entropy can be negative; also it is not invariant under continuous co-ordinate transformations. This problem may be illustrated by a change of units when {{math|''x''}} is a dimensioned variable. {{math|''f''(''x'')}} will then have the units of {{math|1/''x''}}. The argument of the logarithm must be dimensionless, otherwise it is improper, so that the differential entropy as given above will be improper. If {{math|''Δ''}} is some "standard" value of {{math|''x''}} (i.e. "bin size") and therefore has the same units, then a modified differential entropy may be written in proper form as: <math display="block" display="block">\Eta=\int_{-\infty}^\infty f(x) \log(f(x)\,\Delta)\,dx ,</math> and the result will be the same for any choice of units for {{math|''x''}}. In fact, the limit of discrete entropy as <math> N \rightarrow \infty </math> would also include a term of <math> \log(N)</math>, which would in general be infinite. This is expected: continuous variables would typically have infinite entropy when discretized. The [[limiting density of discrete points]] is really a measure of how much easier a distribution is to describe than a distribution that is uniform over its quantization scheme. ===Relative entropy=== {{main|Generalized relative entropy}} Another useful measure of entropy that works equally well in the discrete and the continuous case is the '''relative entropy''' of a distribution. It is defined as the [[KullbackโLeibler divergence]] from the distribution to a reference measure {{math|''m''}} as follows. Assume that a probability distribution {{math|''p''}} is [[absolutely continuous]] with respect to a measure {{math|''m''}}, i.e. is of the form {{math|''p''(''dx'') {{=}} ''f''(''x'')''m''(''dx'')}} for some non-negative {{math|''m''}}-integrable function {{math|''f''}} with {{math|''m''}}-integral 1, then the relative entropy can be defined as <math display="block">D_{\mathrm{KL}}(p \| m ) = \int \log (f(x)) p(dx) = \int f(x)\log (f(x)) m(dx) .</math> In this form the relative entropy generalizes (up to change in sign) both the discrete entropy, where the measure {{math|''m''}} is the [[counting measure]], and the differential entropy, where the measure {{math|''m''}} is the [[Lebesgue measure]]. If the measure {{math|''m''}} is itself a probability distribution, the relative entropy is non-negative, and zero if {{math|''p'' {{=}} ''m''}} as measures. It is defined for any measure space, hence coordinate independent and invariant under co-ordinate reparameterizations if one properly takes into account the transformation of the measure {{math|''m''}}. The relative entropy, and (implicitly) entropy and differential entropy, do depend on the "reference" measure {{math|''m''}}.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)