Editing Normal distribution (section)

=== Maximum entropy ===
Of all probability distributions over the reals with a specified finite mean {{tmath|\mu}} and finite variance {{tmath|\sigma^2}}, the normal distribution <math display=inline>N(\mu,\sigma^2)</math> is the one with [[Maximum entropy probability distribution|maximum entropy]].{{sfnp|Cover|Thomas|2006|p=254}} To see this, let {{tmath|X}} be a [[continuous random variable]] with [[probability density]] {{tmath|f(x)}}. The entropy of {{tmath|X}} is defined as<ref>{{cite book|last1=Williams|first1=David|title=Weighing the odds : a course in probability and statistics|url=https://archive.org/details/weighingoddscour00will|url-access=limited|date=2001|publisher=Cambridge Univ. Press|location=Cambridge [u.a.]|isbn=978-0-521-00618-7|pages=[https://archive.org/details/weighingoddscour00will/page/n219 197]–199|edition=Reprinted.}}</ref><ref>{{cite book|author1=José M. Bernardo |author2=Adrian F. M. Smith|title=Bayesian theory|url=https://archive.org/details/bayesiantheory00bern_963|url-access=limited|date=2000|publisher=Wiley|location=Chichester [u.a.]|isbn=978-0-471-49464-5|pages=[https://archive.org/details/bayesiantheory00bern_963/page/n224 209], 366|edition=Reprint}}</ref><ref>O'Hagan, A. (1994) ''Kendall's Advanced Theory of statistics, Vol 2B, Bayesian Inference'', Edward Arnold. {{isbn|0-340-52922-9}} (Section 5.40)</ref>
<math display=block>
H(X) = - \int_{-\infty}^\infty f(x)\ln f(x)\, dx\,,
</math>

where <math display=inline>f(x)\log f(x)</math> is understood to be zero whenever {{tmath|1=f(x)=0}}. This functional can be maximized, subject to the constraints that the distribution is properly normalized and has a specified mean and variance, by using [[variational calculus]]. A function with three [[Lagrange multipliers]] is defined:

<math display=block>
L=-\int_{-\infty}^\infty f(x)\ln f(x)\,dx-\lambda_0\left(1-\int_{-\infty}^\infty f(x)\,dx\right)-\lambda_1\left(\mu-\int_{-\infty}^\infty f(x)x\,dx\right)-\lambda_2\left(\sigma^2-\int_{-\infty}^\infty f(x)(x-\mu)^2\,dx\right)\,.
</math>

At maximum entropy, a small variation <math display=inline>\delta f(x)</math> about <math display=inline>f(x)</math> will produce a variation <math display=inline>\delta L</math> about {{tmath|L}} which is equal to 0:

<math display=block>
0=\delta L=\int_{-\infty}^\infty \delta f(x)\left(-\ln f(x) -1+\lambda_0+\lambda_1 x+\lambda_2(x-\mu)^2\right)\,dx\,.
</math>

Since this must hold for any small {{tmath|\delta f(x)}}, the factor multiplying {{tmath|\delta f(x)}} must be zero, and solving for {{tmath|f(x)}} yields:

<math display=block>f(x)=\exp\left(-1+\lambda_0+\lambda_1 x+\lambda_2(x-\mu)^2\right)\,.</math>

The Lagrange constraints that {{tmath|f(x)}} is properly normalized and has the specified mean and variance are satisfied if and only if {{tmath|\lambda_0}}, {{tmath|\lambda_1}}, and {{tmath|\lambda_2}} are chosen so that
<math display=block>
f(x)=\frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}\,.
</math>
The entropy of a normal distribution <math display=inline>X \sim N(\mu,\sigma^2)</math> is equal to
<math display=block>
H(X)=\tfrac{1}{2}(1+\ln 2\sigma^2\pi)\,,
</math>
which is independent of the mean {{tmath|\mu}}.