Editing Markov random field (section)

== Exponential family ==
Any positive Markov random field can be written as exponential family in canonical form with feature functions <math>f_k</math> such that the full-joint distribution can be written as

:<math> P(X=x) = \frac{1}{Z} \exp \left( \sum_{k} w_k^{\top} f_k (x_{ \{ k \}}) \right)</math>
where the notation
:<math> w_k^{\top} f_k (x_{ \{ k \}}) = \sum_{i=1}^{N_k} w_{k,i} \cdot f_{k,i}(x_{\{k\}})</math>
is simply a [[dot product]] over field configurations, and ''Z'' is the [[partition function (mathematics)|partition function]]:

:<math> Z = \sum_{x \in \mathcal{X}} \exp \left(\sum_{k} w_k^{\top} f_k(x_{ \{ k \} })\right).</math>

Here, <math>\mathcal{X}</math> denotes the set of all possible assignments of values to all the network's random variables. Usually, the feature functions <math>f_{k,i}</math> are defined such that they are [[indicator function|indicators]] of the clique's configuration, ''i.e.'' <math>f_{k,i}(x_{\{k\}}) = 1</math> if <math>x_{\{k\}}</math> corresponds to the ''i''-th possible configuration of the ''k''-th clique and 0 otherwise. This model is equivalent to the clique factorization model given above, if <math>N_k=|\operatorname{dom}(C_k)|</math> is the cardinality of the clique, and the weight of a feature <math>f_{k,i}</math> corresponds to the logarithm of the corresponding clique factor, ''i.e.'' <math>w_{k,i} = \log \varphi(c_{k,i})</math>, where <math>c_{k,i}</math> is the ''i''-th possible configuration of the ''k''-th clique, ''i.e.'' the ''i''-th value in the domain of the clique <math>C_k</math>.

The probability ''P'' is often called the Gibbs measure.  This expression of a Markov field as a logistic model is only possible if all clique factors are non-zero, ''i.e.'' if none of the elements of <math>\mathcal{X}</math> are assigned a probability of 0.  This allows  techniques from matrix algebra to be applied, ''e.g.'' that the [[trace (linear algebra)|trace]] of a matrix is log of the [[determinant]], with the matrix representation of a graph arising from the graph's [[incidence matrix]].

The importance of the partition function ''Z'' is that many concepts from [[statistical mechanics]], such as [[entropy]], directly generalize to the case of Markov networks, and an ''intuitive'' understanding can thereby be gained.  In addition, the partition function allows [[variational method]]s to be applied to the solution of the problem: one can attach a driving force to one or more of the random variables, and explore the reaction of the network in response to this [[perturbation theory|perturbation]].  Thus, for example, one may add a driving term ''J''<sub>''v''</sub>, for each vertex ''v'' of the graph, to the partition function to get:

:<math> Z[J] = \sum_{x \in \mathcal{X}} \exp \left(\sum_{k} w_k^{\top} f_k(x_{ \{ k \} }) + \sum_v J_v x_v\right)</math>

Formally differentiating with respect to ''J''<sub>''v''</sub> gives the [[expectation value]] of the random variable ''X''<sub>''v''</sub> associated with the vertex ''v'':

:<math>E[X_v] = \frac{1}{Z} \left.\frac{\partial Z[J]}{\partial J_v}\right|_{J_v=0}.</math>

[[Correlation function]]s are computed likewise; the two-point correlation is:

:<math>C[X_u, X_v] = \frac{1}{Z} \left.\frac{\partial^2 Z[J]}{\partial J_u \,\partial J_v}\right|_{J_u=0, J_v=0}.</math>

Unfortunately, though the likelihood of a logistic Markov network is convex, evaluating the likelihood or gradient of the likelihood of a model requires inference in the model, which is generally computationally infeasible (see [[#Inference|'Inference']] below).