Editing Principle of maximum entropy (section)

===Discrete case===
We have some testable information ''I'' about a quantity ''x'' taking values in {''x<sub>1</sub>'', ''x<sub>2</sub>'',..., ''x<sub>n</sub>''}. We assume this information has the form of ''m'' constraints on the expectations of the functions ''f<sub>k</sub>''; that is, we require our probability distribution to satisfy the moment inequality/equality constraints:

:<math>\sum_{i=1}^n \Pr(x_i)f_k(x_i) \geq F_k \qquad k = 1, \ldots,m.</math>

where the <math> F_k </math> are observables.  We also require the probability density to sum to one, which may be viewed as a primitive constraint on the identity function and an observable equal to 1 giving the constraint

:<math>\sum_{i=1}^n \Pr(x_i) = 1.</math>

The probability distribution with maximum information entropy subject to these inequality/equality constraints is of the form:<ref name="BK08"/>

:<math>\Pr(x_i) = \frac{1}{Z(\lambda_1,\ldots, \lambda_m)} \exp\left[\lambda_1 f_1(x_i) + \cdots + \lambda_m f_m(x_i)\right],</math>

for some <math>\lambda_1,\ldots,\lambda_m</math>. It is sometimes called the [[Gibbs distribution]]. The normalization constant is determined by:

:<math> Z(\lambda_1,\ldots, \lambda_m) = \sum_{i=1}^n \exp\left[\lambda_1 f_1(x_i) + \cdots + \lambda_m f_m(x_i)\right],</math>

and is conventionally called the [[partition function (mathematics)|partition function]].  (The [[Pitman&ndash;Koopman theorem]] states that the necessary and sufficient condition for a sampling distribution to admit [[sufficiency (statistics)|sufficient statistics]] of bounded dimension is that it have the general form of a maximum entropy distribution.)

The λ<sub>k</sub> parameters are Lagrange multipliers. In the case of equality constraints their values are determined from the solution of the nonlinear equations

:<math>F_k = \frac{\partial}{\partial \lambda_k} \log Z(\lambda_1,\ldots, \lambda_m).</math>

In the case of inequality constraints, the Lagrange multipliers are determined from the solution of a [[convex optimization]] program with linear constraints.<ref name="BK08"/> 
In both cases, there is no [[closed form solution]], and the computation of the Lagrange multipliers  usually requires  [[Numerical analysis|numerical methods]].