Editing Principle of maximum entropy (section)

===Continuous case===
For [[continuous distribution]]s, the Shannon entropy cannot be used, as it is only defined for discrete probability spaces.  Instead [[E. T. Jaynes|Edwin Jaynes]] (1963, 1968, 2003) gave the following formula, which is closely related to the [[relative entropy]] (see also [[differential entropy]]).

:<math>H_c=-\int p(x)\log\frac{p(x)}{q(x)}\,dx</math>

where ''q''(''x''), which Jaynes called the "invariant measure", is proportional to the [[limiting density of discrete points]]. For now, we shall assume that ''q'' is known; we will discuss it further after the solution equations are given.

A closely related quantity, the relative entropy, is usually defined as the [[Kullback–Leibler divergence]] of ''p'' from ''q'' (although it is sometimes, confusingly, defined as the negative of this).  The inference principle of minimizing this, due to Kullback, is known as the [[Kullback–Leibler divergence#Principle of minimum discrimination information|Principle of Minimum Discrimination Information]].

We have some testable information ''I'' about a quantity ''x'' which takes values in some [[interval (mathematics)|interval]] of the [[real numbers]] (all integrals below are over this interval). We assume this information has the form of ''m'' constraints on the expectations of the functions ''f<sub>k</sub>'', i.e. we require our probability density function to satisfy the inequality (or purely equality) moment constraints:

:<math>\int p(x)f_k(x)\,dx \geq F_k \qquad k = 1, \dotsc,m.</math>

where the <math> F_k </math> are observables.  We also require the probability density to integrate to one, which may be viewed as a primitive constraint on the identity function and an observable equal to 1 giving the constraint

:<math>\int p(x)\,dx = 1.</math>

The probability density function with maximum ''H<sub>c</sub>'' subject to these constraints is:<ref name="BK11"/>

:<math>p(x) = \frac{1}{Z(\lambda_1,\dotsc, \lambda_m)} q(x)\exp\left[\lambda_1 f_1(x) + \dotsb + \lambda_m f_m(x)\right]</math>

with the [[partition function (mathematics)|partition function]] determined by

:<math> Z(\lambda_1,\dotsc, \lambda_m) = \int q(x)\exp\left[\lambda_1 f_1(x) + \dotsb + \lambda_m f_m(x)\right]\,dx.</math>

As in the discrete case, in the case where all moment constraints are equalities, the values of the <math>\lambda_k</math> parameters are determined by the system of nonlinear equations:

:<math>F_k = \frac{\partial}{\partial \lambda_k} \log Z(\lambda_1,\dotsc, \lambda_m).</math>

In the case with  inequality moment  constraints the Lagrange multipliers are determined from the solution of a [[convex optimization]] program.<ref name="BK11"/>

The invariant measure function ''q''(''x'') can be best understood by supposing that ''x'' is known to take values only in the [[bounded interval]] (''a'', ''b''), and that no other information is given. Then the maximum entropy probability density function is

:<math> p(x) = A \cdot q(x), \qquad a < x < b</math>

where ''A'' is a normalization constant. The invariant measure function is actually the prior density function encoding 'lack of relevant information'.  It cannot be determined by the principle of maximum entropy, and must be determined by some other logical method, such as the [[principle of transformation groups]] or [[Marginalization (probability)|marginalization theory]].