Editing Markov random field

{{Short description|Set of random variables}}
[[File:markov random field example.png|thumb|alt=An example of a Markov random field.|An example of a Markov random field. Each edge represents dependency. In this example: A depends on B and D. B depends on A and D. D depends on A, B, and E. E depends on D and C. C depends on E.]]

In the domain of [[physics]] and [[probability]], a '''Markov random field''' ('''MRF'''), '''Markov network''' or '''undirected [[graphical model]]''' is a set of [[random variable]]s having a [[Markov property]] described by an [[undirected graph]]. In other words, a [[random field]] is said to be a [[Andrey Markov|Markov]] random field if it satisfies Markov properties. The concept originates from the [[Spin glass#Sherrington–Kirkpatrick model|Sherrington–Kirkpatrick model]].<ref>{{citation|title=Solvable Model of a Spin-Glass|number=35|year=1975|author1= Sherrington, David|author2=Kirkpatrick, Scott|journal=Physical Review Letters|volume=35|pages=1792–1796|doi=10.1103/PhysRevLett.35.1792|bibcode=1975PhRvL..35.1792S}}</ref>

A Markov network or MRF is similar to a [[Bayesian network]] in its representation of dependencies; the differences being that Bayesian networks are [[directed acyclic graph|directed and acyclic]], whereas Markov networks are undirected and may be cyclic. Thus, a Markov network can represent certain dependencies that a Bayesian network cannot (such as cyclic dependencies {{Explain|date=July 2018}}); on the other hand, it can't represent certain dependencies that a Bayesian network can (such as induced dependencies {{Explain|date=July 2018}}). The underlying graph of a Markov random field may be finite or infinite.

When the [[joint probability distribution|joint probability density]] of the random variables is strictly positive, it is also referred to as a '''Gibbs random field''', because, according to the [[Hammersley–Clifford theorem]], it can then be represented by a [[Gibbs measure]] for an appropriate (locally defined) energy function. The prototypical Markov random field is the [[Ising model]]; indeed, the Markov random field was introduced as the general setting for the Ising model.<ref name="Kindermann-Snell80">{{cite book
 |first1=Ross
 |last1=Kindermann
 |first2=J. Laurie
 |last2=Snell
 |url=http://www.cmap.polytechnique.fr/~rama/ehess/mrfbook.pdf
 |title=Markov Random Fields and Their Applications
 |year=1980
 |publisher=American Mathematical Society
 |isbn=978-0-8218-5001-5
 |mr=0620955
 |access-date=2012-04-09
 |archive-date=2017-08-10
 |archive-url=https://web.archive.org/web/20170810092327/http://www.cmap.polytechnique.fr/%7Erama/ehess/mrfbook.pdf
 |url-status=dead
 }}</ref> In the domain of [[artificial intelligence]], a Markov random field is used to model various low- to mid-level tasks in [[image processing]] and [[computer vision]].<ref>{{cite book
 |first1=S. Z. |last1=Li 
 |title=Markov Random Field Modeling in Image Analysis
 |year=2009
 |publisher=Springer
 |url=https://books.google.com/books?id=rDsObhDkCIAC
|isbn=9781848002791 
 }}</ref>

== Definition ==
Given an undirected graph <math>G=(V,E)</math>, a set of random variables <math>X = (X_v)_{v\in V}</math> indexed by <math>V</math>&nbsp; form a Markov random field with respect to <math>G</math>&nbsp; if they satisfy the local Markov properties:

:Pairwise Markov property: Any two non-adjacent variables are [[conditional independence|conditionally independent]] given all other variables:

::<math>X_u \perp\!\!\!\perp X_v \mid X_{V \smallsetminus \{u,v\}} </math>

:Local Markov property: A variable is conditionally independent of all other variables given its neighbors:

::<math>X_v \perp\!\!\!\perp X_{V\smallsetminus \operatorname{N}[v]} \mid X_{\operatorname{N}(v)}</math>
:where <math display="inline">\operatorname{N}(v)</math> is the set of neighbors of <math>v</math>, and <math>\operatorname{N}[v] = v \cup \operatorname{N}(v)</math> is the [[Neighborhood (graph theory)|closed neighbourhood]] of <math>v</math>.

:Global Markov property: Any two subsets of variables are conditionally independent given a separating subset:

::<math>X_A \perp\!\!\!\perp X_B \mid X_S</math>
:where every path from a node in <math>A</math> to a node in <math>B</math> passes through <math>S</math>.

The Global Markov property is stronger than the Local Markov property, which in turn is stronger than the Pairwise one.<ref>{{cite book |last1=Lauritzen |first1=Steffen |title=Graphical models |date=1996 |publisher=Clarendon Press |location=Oxford |isbn=978-0198522195 |page=33}}</ref> However, the above three Markov properties are equivalent for positive distributions<ref>{{cite book|title=Probabilistic Graphical Models|last1=Koller|last2=Friedman|publisher=MIT Press|date=2009|isbn=9780262013192|first1=Daphne|first2=Nir|page=114-122}}</ref> (those that assign only nonzero probabilities to the associated variables).

The relation between the three Markov properties is particularly clear in the following formulation:

* Pairwise: For any <math>i, j \in V</math> not equal or adjacent, <math>X_i \perp\!\!\!\perp X_j | X_{V \smallsetminus \{i, j\}}</math>.
* Local: For any <math>i\in V</math> and <math>J\subset V</math> not containing or adjacent to <math>i</math>, <math>X_i \perp\!\!\!\perp X_J | X_{V \smallsetminus (\{i\}\cup J)}</math>.
* Global: For any <math>I, J\subset V</math> not intersecting or adjacent, <math>X_I \perp\!\!\!\perp X_J | X_{V \smallsetminus (I\cup J)}</math>.

== Clique factorization ==
As the Markov property of an arbitrary probability distribution can be difficult to establish, a commonly used class of Markov random fields are those that can be factorized according to the [[Clique (graph theory)|clique]]s of the graph.

Given a set of random variables <math>X = (X_v)_{v\in V}</math>, let <math>P(X=x)</math> be the [[Probability density function|probability]] of a particular field configuration <math>x</math> in&nbsp;<math>X</math>—that is, <math>P(X=x)</math> is the probability of finding that the random variables <math>X</math> take on the particular value <math>x</math>.  Because <math>X</math> is a set, the probability of <math>x</math> should be understood to be taken with respect to a ''joint distribution'' of the <math>X_v</math>.

If this joint density can be factorized over the cliques of <math>G</math> as

:<math>P(X=x) = \prod_{C \in \operatorname{cl}(G)} \varphi_C (x_C) </math>

then <math>X</math> forms a Markov random field with respect to <math>G</math>.  Here, <math>\operatorname{cl}(G)</math> is the set of cliques of <math>G</math>.   The definition is equivalent if only maximal cliques are used. The functions <math>\varphi_C</math> are sometimes referred to as ''factor potentials'' or ''clique potentials''. Note, however, conflicting terminology is in use: the word ''potential'' is often applied to the logarithm of <math>\varphi_C</math>.  This is because, in [[statistical mechanics]], <math>\log(\varphi_C)</math> has a direct interpretation as the [[potential energy]] of a [[Configuration space (physics)|configuration]]&nbsp;<math>x_C</math>.

Some MRF's do not factorize: a simple example can be constructed on a cycle of 4 nodes with some infinite energies, i.e. configurations of zero probabilities,<ref>{{cite journal
 |first=John |last=Moussouris
 |title=Gibbs and Markov random systems with constraints 
 |journal=Journal of Statistical Physics
 |volume=10 |issue=1 |pages=11–33 |year=1974
 |doi=10.1007/BF01011714 |mr=0432132
|hdl=10338.dmlcz/135184
 |bibcode=1974JSP....10...11M
 |s2cid=121299906
 |hdl-access=free}}</ref> even if one, more appropriately, allows the infinite energies to act on the complete graph on <math>V</math>.<ref>{{cite journal
 | last1 = Gandolfi | first1 = Alberto
 | last2 = Lenarda | first2 = Pietro
 |title= A note on Gibbs and Markov Random Fields with constraints and their moments
 |journal=Mathematics and Mechanics of Complex Systems
 |volume=4 |issue=3–4 |pages=407–422 |year=2016 | doi=10.2140/memocs.2016.4.407
 |doi-access=free }}</ref>

MRF's factorize if at least one of the following conditions is fulfilled:
* the density is strictly positive (by the [[Hammersley–Clifford theorem]])
* the graph is [[Chordal graph|chordal]] (by equivalence to a [[Bayesian network]])

When such a factorization does exist, it is possible to construct a [[factor graph]] for the network.

== Exponential family ==
Any positive Markov random field can be written as exponential family in canonical form with feature functions <math>f_k</math> such that the full-joint distribution can be written as

:<math> P(X=x) = \frac{1}{Z} \exp \left( \sum_{k} w_k^{\top} f_k (x_{ \{ k \}}) \right)</math>
where the notation
:<math> w_k^{\top} f_k (x_{ \{ k \}}) = \sum_{i=1}^{N_k} w_{k,i} \cdot f_{k,i}(x_{\{k\}})</math>
is simply a [[dot product]] over field configurations, and ''Z'' is the [[partition function (mathematics)|partition function]]:

:<math> Z = \sum_{x \in \mathcal{X}} \exp \left(\sum_{k} w_k^{\top} f_k(x_{ \{ k \} })\right).</math>

Here, <math>\mathcal{X}</math> denotes the set of all possible assignments of values to all the network's random variables. Usually, the feature functions <math>f_{k,i}</math> are defined such that they are [[indicator function|indicators]] of the clique's configuration, ''i.e.'' <math>f_{k,i}(x_{\{k\}}) = 1</math> if <math>x_{\{k\}}</math> corresponds to the ''i''-th possible configuration of the ''k''-th clique and 0 otherwise. This model is equivalent to the clique factorization model given above, if <math>N_k=|\operatorname{dom}(C_k)|</math> is the cardinality of the clique, and the weight of a feature <math>f_{k,i}</math> corresponds to the logarithm of the corresponding clique factor, ''i.e.'' <math>w_{k,i} = \log \varphi(c_{k,i})</math>, where <math>c_{k,i}</math> is the ''i''-th possible configuration of the ''k''-th clique, ''i.e.'' the ''i''-th value in the domain of the clique <math>C_k</math>.

The probability ''P'' is often called the Gibbs measure.  This expression of a Markov field as a logistic model is only possible if all clique factors are non-zero, ''i.e.'' if none of the elements of <math>\mathcal{X}</math> are assigned a probability of 0.  This allows  techniques from matrix algebra to be applied, ''e.g.'' that the [[trace (linear algebra)|trace]] of a matrix is log of the [[determinant]], with the matrix representation of a graph arising from the graph's [[incidence matrix]].

The importance of the partition function ''Z'' is that many concepts from [[statistical mechanics]], such as [[entropy]], directly generalize to the case of Markov networks, and an ''intuitive'' understanding can thereby be gained.  In addition, the partition function allows [[variational method]]s to be applied to the solution of the problem: one can attach a driving force to one or more of the random variables, and explore the reaction of the network in response to this [[perturbation theory|perturbation]].  Thus, for example, one may add a driving term ''J''<sub>''v''</sub>, for each vertex ''v'' of the graph, to the partition function to get:

:<math> Z[J] = \sum_{x \in \mathcal{X}} \exp \left(\sum_{k} w_k^{\top} f_k(x_{ \{ k \} }) + \sum_v J_v x_v\right)</math>

Formally differentiating with respect to ''J''<sub>''v''</sub> gives the [[expectation value]] of the random variable ''X''<sub>''v''</sub> associated with the vertex ''v'':

:<math>E[X_v] = \frac{1}{Z} \left.\frac{\partial Z[J]}{\partial J_v}\right|_{J_v=0}.</math>

[[Correlation function]]s are computed likewise; the two-point correlation is:

:<math>C[X_u, X_v] = \frac{1}{Z} \left.\frac{\partial^2 Z[J]}{\partial J_u \,\partial J_v}\right|_{J_u=0, J_v=0}.</math>

Unfortunately, though the likelihood of a logistic Markov network is convex, evaluating the likelihood or gradient of the likelihood of a model requires inference in the model, which is generally computationally infeasible (see [[#Inference|'Inference']] below).

== Examples ==

=== Gaussian ===
A [[multivariate normal distribution]] forms a Markov random field with respect to a graph <math>G=(V,E)</math> if the missing edges correspond to zeros on the [[precision matrix]] (the inverse [[covariance matrix]]):

:<math>X=(X_v)_{v\in V} \sim \mathcal N (\boldsymbol \mu, \Sigma)
</math>
such that
:<math>(\Sigma^{-1})_{uv} =0 \quad \text{iff} \quad \{u,v\} \notin E .</math><ref>{{cite book
 |first1=Håvard |last1=Rue |first2=Leonhard |last2=Held
 |title=Gaussian Markov random fields: theory and applications
 |publisher=CRC Press |year=2005
 |isbn=978-1-58488-432-3
}}</ref>

== Inference ==
As in a [[Bayesian network]], one may calculate the [[conditional distribution]] of a set of nodes <math> V' = \{ v_1 ,\ldots, v_i \} </math> given values to another set of nodes <math> W' = \{ w_1 ,\ldots, w_j \} </math> in the Markov random field by summing over all possible assignments to <math>u \notin V',W'</math>; this is called [[exact inference]]. However, exact inference is a [[Sharp-P-complete|#P-complete]] problem, and thus computationally intractable in the general case. Approximation techniques such as [[Markov chain Monte Carlo]] and loopy [[belief propagation]] are often more feasible in practice.  Some particular subclasses of MRFs, such as trees (see [[Chow–Liu tree]]), have polynomial-time inference algorithms; discovering such subclasses is an active research topic.  There are also subclasses of MRFs that permit efficient [[Maximum a posteriori|MAP]], or most likely assignment, inference; examples of these include associative networks.<ref>{{citation
 | last1 = Taskar | first1 = Benjamin
 | last2 = Chatalbashev | first2 = Vassil
 | last3 = Koller | first3 = Daphne | author3-link = Daphne Koller
 | editor-last = Brodley | editor-first = Carla E. | editor-link = Carla Brodley
 | contribution = Learning associative Markov networks
 | doi = 10.1145/1015330.1015444
 | publisher = [[Association for Computing Machinery]]
 | series = ACM International Conference Proceeding Series
 | title = Proceedings of the Twenty-First International Conference on Machine Learning (ICML 2004), Banff, Alberta, Canada, July 4-8, 2004
 | volume = 69
 | pages = 102
 | year = 2004| title-link = International Conference on Machine Learning
 | isbn = 978-1581138283
 | citeseerx = 10.1.1.157.329
 | s2cid = 11312524
 }}.</ref><ref>{{citation
 | last1 = Duchi | first1 = John C.
 | last2 = Tarlow | first2 = Daniel
 | last3 = Elidan | first3 = Gal
 | last4 = Koller | first4 = Daphne | author4-link = Daphne Koller
 | editor1-last = Schölkopf | editor1-first = Bernhard
 | editor2-last = Platt | editor2-first = John C.
 | editor3-last = Hoffman | editor3-first = Thomas
 | contribution = Using Combinatorial Optimization within Max-Product Belief Propagation
 | contribution-url = http://papers.nips.cc/paper/3117-using-combinatorial-optimization-within-max-product-belief-propagation
 | pages = 369–376
 | publisher = [[MIT Press]]
 | series = [[Conference on Neural Information Processing Systems|Advances in Neural Information Processing Systems]] | volume = 19
 | title = Proceedings of the Twentieth Annual Conference on Neural Information Processing Systems, Vancouver, British Columbia, Canada, December 4-7, 2006
 | year = 2006}}.</ref> Another interesting sub-class is the one of decomposable models (when the graph is [[Chordal graph|chordal]]): having a closed-form for the [[Maximum likelihood estimate|MLE]], it is possible to discover a consistent structure for hundreds of variables.<ref name="Petitjean">{{cite conference|last1=Petitjean|first1=F.|last2=Webb|first2=G.I.|last3=Nicholson|first3=A.E.|author-link3=Ann Nicholson|year=2013|title=Scaling log-linear analysis to high-dimensional data|url=http://www.tiny-clues.eu/Research/Petitjean2013-ICDM.pdf|conference=International Conference on Data Mining|location=Dallas, TX, USA|publisher=IEEE}}</ref>

== Conditional random fields ==
{{Main|Conditional random field}}
One notable variant of a Markov random field is a '''[[conditional random field]]''', in which each random variable may also be conditioned upon a set of global observations <math>o</math>. In this model, each function <math>\varphi_k</math> is a mapping from all assignments to both the [[Clique (graph theory)|clique]] ''k'' and the observations <math>o</math> to the nonnegative real numbers. This form of the Markov network may be more appropriate for producing [[discriminative model|discriminative classifiers]], which do not model the distribution over the observations. CRFs were proposed by [[John D. Lafferty]], [[Andrew McCallum]] and [[Fernando C.N. Pereira]] in 2001.<ref name=ICML03classic>{{cite web |url=http://icml.cc/2013/?page_id=21 |title=Two classic paper prizes for papers that appeared at ICML 2013 |date=2013 |website=ICML |access-date=15 December 2014}}</ref>

== Varied applications ==
Markov random fields find application in a variety of fields, ranging from [[computer graphics (computer science)|computer graphics]] to computer vision,<ref>{{Cite book |last1=Banf |first1=Michael |last2=Blanz |first2=Volker |chapter=Man made structure detection and verification of object recognition in images for the visually impaired |date=2013-06-06 |title=Proceedings of the 6th International Conference on Computer Vision / Computer Graphics Collaboration Techniques and Applications |chapter-url=https://dl.acm.org/doi/10.1145/2466715.2466732 |series=MIRAGE '13 |location=New York, NY, USA |publisher=Association for Computing Machinery |pages=1–8 |doi=10.1145/2466715.2466732 |isbn=978-1-4503-2023-8}}</ref> [[machine learning]] or [[computational biology]],<ref name="Kindermann-Snell80"/><ref>{{Cite journal|last1=Banf|first1=Michael|last2=Rhee|first2=Seung Y.|date=2017-02-01|title=Enhancing gene regulatory network inference through data integration with markov random fields|journal=Scientific Reports|volume=7|issue=1|pages=41174|doi=10.1038/srep41174|pmid=28145456|pmc=5286517|issn=2045-2322|bibcode=2017NatSR...741174B}}</ref> and [[information retrieval]].<ref>{{Cite conference | first1= Donald | last1 = Metzler | first2 = W.Bruce |last2=Croft| title=A Markov random field model for term dependencies | year = 2005 | conference = Proceedings of the 28th ACM SIGIR Conference| pages = 472–479 | publisher=ACM | location= Salvador, Brazil | doi=10.1145/1076034.1076115}}</ref> MRFs are used in image processing to generate textures as they can be used to generate flexible and stochastic image models. In image modelling, the task is to find a suitable intensity distribution of a given image, where suitability depends on the kind of task and MRFs are flexible enough to be used for image and texture synthesis, [[image compression]] and restoration, [[image segmentation]], 3D image inference from 2D images, [[image registration]], [[texture synthesis]], [[super-resolution]], [[Computer stereo vision|stereo matching]] and [[information retrieval]]. They can be used to solve various computer vision problems which can be posed as energy minimization problems or problems where different regions have to be distinguished using a set of discriminating features, within a Markov random field framework, to predict the category of the region.<ref>{{Cite journal |title = Automatic Identification of Window Regions on Indoor Point Clouds Using LiDAR and Cameras|last = Zhang & Zakhor|first = Richard & Avideh|date = 2014|journal = VIP Lab Publications|citeseerx = 10.1.1.649.303}}</ref>  Markov random fields were a generalization over the  Ising model and have, since then, been used widely in combinatorial optimizations and networks.

== See also ==
{{Div col|colwidth=20em}}
* [[Constraint composite graph]]
* [[Graphical model]]
* [[Dependency network (graphical model)]]
* [[Hammersley–Clifford theorem]]
* [[Hopfield network]]
* [[Interacting particle system]]
* [[Ising model]]
* [[Log-linear analysis]]
* [[Markov chain]]
* [[Markov logic network]]
* [[Maximum entropy method]]
* [[Stochastic cellular automaton]]
{{Div col end}}

==References==
{{reflist}}

{{Stochastic processes}}

[[Category:Graphical models]]
[[Category:Markov networks| ]]