Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Markov random field
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
{{Short description|Set of random variables}} [[File:markov random field example.png|thumb|alt=An example of a Markov random field.|An example of a Markov random field. Each edge represents dependency. In this example: A depends on B and D. B depends on A and D. D depends on A, B, and E. E depends on D and C. C depends on E.]] In the domain of [[physics]] and [[probability]], a '''Markov random field''' ('''MRF'''), '''Markov network''' or '''undirected [[graphical model]]''' is a set of [[random variable]]s having a [[Markov property]] described by an [[undirected graph]]. In other words, a [[random field]] is said to be a [[Andrey Markov|Markov]] random field if it satisfies Markov properties. The concept originates from the [[Spin glass#Sherrington–Kirkpatrick model|Sherrington–Kirkpatrick model]].<ref>{{citation|title=Solvable Model of a Spin-Glass|number=35|year=1975|author1= Sherrington, David|author2=Kirkpatrick, Scott|journal=Physical Review Letters|volume=35|pages=1792–1796|doi=10.1103/PhysRevLett.35.1792|bibcode=1975PhRvL..35.1792S}}</ref> A Markov network or MRF is similar to a [[Bayesian network]] in its representation of dependencies; the differences being that Bayesian networks are [[directed acyclic graph|directed and acyclic]], whereas Markov networks are undirected and may be cyclic. Thus, a Markov network can represent certain dependencies that a Bayesian network cannot (such as cyclic dependencies {{Explain|date=July 2018}}); on the other hand, it can't represent certain dependencies that a Bayesian network can (such as induced dependencies {{Explain|date=July 2018}}). The underlying graph of a Markov random field may be finite or infinite. When the [[joint probability distribution|joint probability density]] of the random variables is strictly positive, it is also referred to as a '''Gibbs random field''', because, according to the [[Hammersley–Clifford theorem]], it can then be represented by a [[Gibbs measure]] for an appropriate (locally defined) energy function. The prototypical Markov random field is the [[Ising model]]; indeed, the Markov random field was introduced as the general setting for the Ising model.<ref name="Kindermann-Snell80">{{cite book |first1=Ross |last1=Kindermann |first2=J. Laurie |last2=Snell |url=http://www.cmap.polytechnique.fr/~rama/ehess/mrfbook.pdf |title=Markov Random Fields and Their Applications |year=1980 |publisher=American Mathematical Society |isbn=978-0-8218-5001-5 |mr=0620955 |access-date=2012-04-09 |archive-date=2017-08-10 |archive-url=https://web.archive.org/web/20170810092327/http://www.cmap.polytechnique.fr/%7Erama/ehess/mrfbook.pdf |url-status=dead }}</ref> In the domain of [[artificial intelligence]], a Markov random field is used to model various low- to mid-level tasks in [[image processing]] and [[computer vision]].<ref>{{cite book |first1=S. Z. |last1=Li |title=Markov Random Field Modeling in Image Analysis |year=2009 |publisher=Springer |url=https://books.google.com/books?id=rDsObhDkCIAC |isbn=9781848002791 }}</ref> == Definition == Given an undirected graph <math>G=(V,E)</math>, a set of random variables <math>X = (X_v)_{v\in V}</math> indexed by <math>V</math> form a Markov random field with respect to <math>G</math> if they satisfy the local Markov properties: :Pairwise Markov property: Any two non-adjacent variables are [[conditional independence|conditionally independent]] given all other variables: ::<math>X_u \perp\!\!\!\perp X_v \mid X_{V \smallsetminus \{u,v\}} </math> :Local Markov property: A variable is conditionally independent of all other variables given its neighbors: ::<math>X_v \perp\!\!\!\perp X_{V\smallsetminus \operatorname{N}[v]} \mid X_{\operatorname{N}(v)}</math> :where <math display="inline">\operatorname{N}(v)</math> is the set of neighbors of <math>v</math>, and <math>\operatorname{N}[v] = v \cup \operatorname{N}(v)</math> is the [[Neighborhood (graph theory)|closed neighbourhood]] of <math>v</math>. :Global Markov property: Any two subsets of variables are conditionally independent given a separating subset: ::<math>X_A \perp\!\!\!\perp X_B \mid X_S</math> :where every path from a node in <math>A</math> to a node in <math>B</math> passes through <math>S</math>. The Global Markov property is stronger than the Local Markov property, which in turn is stronger than the Pairwise one.<ref>{{cite book |last1=Lauritzen |first1=Steffen |title=Graphical models |date=1996 |publisher=Clarendon Press |location=Oxford |isbn=978-0198522195 |page=33}}</ref> However, the above three Markov properties are equivalent for positive distributions<ref>{{cite book|title=Probabilistic Graphical Models|last1=Koller|last2=Friedman|publisher=MIT Press|date=2009|isbn=9780262013192|first1=Daphne|first2=Nir|page=114-122}}</ref> (those that assign only nonzero probabilities to the associated variables). The relation between the three Markov properties is particularly clear in the following formulation: * Pairwise: For any <math>i, j \in V</math> not equal or adjacent, <math>X_i \perp\!\!\!\perp X_j | X_{V \smallsetminus \{i, j\}}</math>. * Local: For any <math>i\in V</math> and <math>J\subset V</math> not containing or adjacent to <math>i</math>, <math>X_i \perp\!\!\!\perp X_J | X_{V \smallsetminus (\{i\}\cup J)}</math>. * Global: For any <math>I, J\subset V</math> not intersecting or adjacent, <math>X_I \perp\!\!\!\perp X_J | X_{V \smallsetminus (I\cup J)}</math>. == Clique factorization == As the Markov property of an arbitrary probability distribution can be difficult to establish, a commonly used class of Markov random fields are those that can be factorized according to the [[Clique (graph theory)|clique]]s of the graph. Given a set of random variables <math>X = (X_v)_{v\in V}</math>, let <math>P(X=x)</math> be the [[Probability density function|probability]] of a particular field configuration <math>x</math> in <math>X</math>—that is, <math>P(X=x)</math> is the probability of finding that the random variables <math>X</math> take on the particular value <math>x</math>. Because <math>X</math> is a set, the probability of <math>x</math> should be understood to be taken with respect to a ''joint distribution'' of the <math>X_v</math>. If this joint density can be factorized over the cliques of <math>G</math> as :<math>P(X=x) = \prod_{C \in \operatorname{cl}(G)} \varphi_C (x_C) </math> then <math>X</math> forms a Markov random field with respect to <math>G</math>. Here, <math>\operatorname{cl}(G)</math> is the set of cliques of <math>G</math>. The definition is equivalent if only maximal cliques are used. The functions <math>\varphi_C</math> are sometimes referred to as ''factor potentials'' or ''clique potentials''. Note, however, conflicting terminology is in use: the word ''potential'' is often applied to the logarithm of <math>\varphi_C</math>. This is because, in [[statistical mechanics]], <math>\log(\varphi_C)</math> has a direct interpretation as the [[potential energy]] of a [[Configuration space (physics)|configuration]] <math>x_C</math>. Some MRF's do not factorize: a simple example can be constructed on a cycle of 4 nodes with some infinite energies, i.e. configurations of zero probabilities,<ref>{{cite journal |first=John |last=Moussouris |title=Gibbs and Markov random systems with constraints |journal=Journal of Statistical Physics |volume=10 |issue=1 |pages=11–33 |year=1974 |doi=10.1007/BF01011714 |mr=0432132 |hdl=10338.dmlcz/135184 |bibcode=1974JSP....10...11M |s2cid=121299906 |hdl-access=free}}</ref> even if one, more appropriately, allows the infinite energies to act on the complete graph on <math>V</math>.<ref>{{cite journal | last1 = Gandolfi | first1 = Alberto | last2 = Lenarda | first2 = Pietro |title= A note on Gibbs and Markov Random Fields with constraints and their moments |journal=Mathematics and Mechanics of Complex Systems |volume=4 |issue=3–4 |pages=407–422 |year=2016 | doi=10.2140/memocs.2016.4.407 |doi-access=free }}</ref> MRF's factorize if at least one of the following conditions is fulfilled: * the density is strictly positive (by the [[Hammersley–Clifford theorem]]) * the graph is [[Chordal graph|chordal]] (by equivalence to a [[Bayesian network]]) When such a factorization does exist, it is possible to construct a [[factor graph]] for the network. == Exponential family == Any positive Markov random field can be written as exponential family in canonical form with feature functions <math>f_k</math> such that the full-joint distribution can be written as :<math> P(X=x) = \frac{1}{Z} \exp \left( \sum_{k} w_k^{\top} f_k (x_{ \{ k \}}) \right)</math> where the notation :<math> w_k^{\top} f_k (x_{ \{ k \}}) = \sum_{i=1}^{N_k} w_{k,i} \cdot f_{k,i}(x_{\{k\}})</math> is simply a [[dot product]] over field configurations, and ''Z'' is the [[partition function (mathematics)|partition function]]: :<math> Z = \sum_{x \in \mathcal{X}} \exp \left(\sum_{k} w_k^{\top} f_k(x_{ \{ k \} })\right).</math> Here, <math>\mathcal{X}</math> denotes the set of all possible assignments of values to all the network's random variables. Usually, the feature functions <math>f_{k,i}</math> are defined such that they are [[indicator function|indicators]] of the clique's configuration, ''i.e.'' <math>f_{k,i}(x_{\{k\}}) = 1</math> if <math>x_{\{k\}}</math> corresponds to the ''i''-th possible configuration of the ''k''-th clique and 0 otherwise. This model is equivalent to the clique factorization model given above, if <math>N_k=|\operatorname{dom}(C_k)|</math> is the cardinality of the clique, and the weight of a feature <math>f_{k,i}</math> corresponds to the logarithm of the corresponding clique factor, ''i.e.'' <math>w_{k,i} = \log \varphi(c_{k,i})</math>, where <math>c_{k,i}</math> is the ''i''-th possible configuration of the ''k''-th clique, ''i.e.'' the ''i''-th value in the domain of the clique <math>C_k</math>. The probability ''P'' is often called the Gibbs measure. This expression of a Markov field as a logistic model is only possible if all clique factors are non-zero, ''i.e.'' if none of the elements of <math>\mathcal{X}</math> are assigned a probability of 0. This allows techniques from matrix algebra to be applied, ''e.g.'' that the [[trace (linear algebra)|trace]] of a matrix is log of the [[determinant]], with the matrix representation of a graph arising from the graph's [[incidence matrix]]. The importance of the partition function ''Z'' is that many concepts from [[statistical mechanics]], such as [[entropy]], directly generalize to the case of Markov networks, and an ''intuitive'' understanding can thereby be gained. In addition, the partition function allows [[variational method]]s to be applied to the solution of the problem: one can attach a driving force to one or more of the random variables, and explore the reaction of the network in response to this [[perturbation theory|perturbation]]. Thus, for example, one may add a driving term ''J''<sub>''v''</sub>, for each vertex ''v'' of the graph, to the partition function to get: :<math> Z[J] = \sum_{x \in \mathcal{X}} \exp \left(\sum_{k} w_k^{\top} f_k(x_{ \{ k \} }) + \sum_v J_v x_v\right)</math> Formally differentiating with respect to ''J''<sub>''v''</sub> gives the [[expectation value]] of the random variable ''X''<sub>''v''</sub> associated with the vertex ''v'': :<math>E[X_v] = \frac{1}{Z} \left.\frac{\partial Z[J]}{\partial J_v}\right|_{J_v=0}.</math> [[Correlation function]]s are computed likewise; the two-point correlation is: :<math>C[X_u, X_v] = \frac{1}{Z} \left.\frac{\partial^2 Z[J]}{\partial J_u \,\partial J_v}\right|_{J_u=0, J_v=0}.</math> Unfortunately, though the likelihood of a logistic Markov network is convex, evaluating the likelihood or gradient of the likelihood of a model requires inference in the model, which is generally computationally infeasible (see [[#Inference|'Inference']] below). == Examples == === Gaussian === A [[multivariate normal distribution]] forms a Markov random field with respect to a graph <math>G=(V,E)</math> if the missing edges correspond to zeros on the [[precision matrix]] (the inverse [[covariance matrix]]): :<math>X=(X_v)_{v\in V} \sim \mathcal N (\boldsymbol \mu, \Sigma) </math> such that :<math>(\Sigma^{-1})_{uv} =0 \quad \text{iff} \quad \{u,v\} \notin E .</math><ref>{{cite book |first1=Håvard |last1=Rue |first2=Leonhard |last2=Held |title=Gaussian Markov random fields: theory and applications |publisher=CRC Press |year=2005 |isbn=978-1-58488-432-3 }}</ref> == Inference == As in a [[Bayesian network]], one may calculate the [[conditional distribution]] of a set of nodes <math> V' = \{ v_1 ,\ldots, v_i \} </math> given values to another set of nodes <math> W' = \{ w_1 ,\ldots, w_j \} </math> in the Markov random field by summing over all possible assignments to <math>u \notin V',W'</math>; this is called [[exact inference]]. However, exact inference is a [[Sharp-P-complete|#P-complete]] problem, and thus computationally intractable in the general case. Approximation techniques such as [[Markov chain Monte Carlo]] and loopy [[belief propagation]] are often more feasible in practice. Some particular subclasses of MRFs, such as trees (see [[Chow–Liu tree]]), have polynomial-time inference algorithms; discovering such subclasses is an active research topic. There are also subclasses of MRFs that permit efficient [[Maximum a posteriori|MAP]], or most likely assignment, inference; examples of these include associative networks.<ref>{{citation | last1 = Taskar | first1 = Benjamin | last2 = Chatalbashev | first2 = Vassil | last3 = Koller | first3 = Daphne | author3-link = Daphne Koller | editor-last = Brodley | editor-first = Carla E. | editor-link = Carla Brodley | contribution = Learning associative Markov networks | doi = 10.1145/1015330.1015444 | publisher = [[Association for Computing Machinery]] | series = ACM International Conference Proceeding Series | title = Proceedings of the Twenty-First International Conference on Machine Learning (ICML 2004), Banff, Alberta, Canada, July 4-8, 2004 | volume = 69 | pages = 102 | year = 2004| title-link = International Conference on Machine Learning | isbn = 978-1581138283 | citeseerx = 10.1.1.157.329 | s2cid = 11312524 }}.</ref><ref>{{citation | last1 = Duchi | first1 = John C. | last2 = Tarlow | first2 = Daniel | last3 = Elidan | first3 = Gal | last4 = Koller | first4 = Daphne | author4-link = Daphne Koller | editor1-last = Schölkopf | editor1-first = Bernhard | editor2-last = Platt | editor2-first = John C. | editor3-last = Hoffman | editor3-first = Thomas | contribution = Using Combinatorial Optimization within Max-Product Belief Propagation | contribution-url = http://papers.nips.cc/paper/3117-using-combinatorial-optimization-within-max-product-belief-propagation | pages = 369–376 | publisher = [[MIT Press]] | series = [[Conference on Neural Information Processing Systems|Advances in Neural Information Processing Systems]] | volume = 19 | title = Proceedings of the Twentieth Annual Conference on Neural Information Processing Systems, Vancouver, British Columbia, Canada, December 4-7, 2006 | year = 2006}}.</ref> Another interesting sub-class is the one of decomposable models (when the graph is [[Chordal graph|chordal]]): having a closed-form for the [[Maximum likelihood estimate|MLE]], it is possible to discover a consistent structure for hundreds of variables.<ref name="Petitjean">{{cite conference|last1=Petitjean|first1=F.|last2=Webb|first2=G.I.|last3=Nicholson|first3=A.E.|author-link3=Ann Nicholson|year=2013|title=Scaling log-linear analysis to high-dimensional data|url=http://www.tiny-clues.eu/Research/Petitjean2013-ICDM.pdf|conference=International Conference on Data Mining|location=Dallas, TX, USA|publisher=IEEE}}</ref> == Conditional random fields == {{Main|Conditional random field}} One notable variant of a Markov random field is a '''[[conditional random field]]''', in which each random variable may also be conditioned upon a set of global observations <math>o</math>. In this model, each function <math>\varphi_k</math> is a mapping from all assignments to both the [[Clique (graph theory)|clique]] ''k'' and the observations <math>o</math> to the nonnegative real numbers. This form of the Markov network may be more appropriate for producing [[discriminative model|discriminative classifiers]], which do not model the distribution over the observations. CRFs were proposed by [[John D. Lafferty]], [[Andrew McCallum]] and [[Fernando C.N. Pereira]] in 2001.<ref name=ICML03classic>{{cite web |url=http://icml.cc/2013/?page_id=21 |title=Two classic paper prizes for papers that appeared at ICML 2013 |date=2013 |website=ICML |access-date=15 December 2014}}</ref> == Varied applications == Markov random fields find application in a variety of fields, ranging from [[computer graphics (computer science)|computer graphics]] to computer vision,<ref>{{Cite book |last1=Banf |first1=Michael |last2=Blanz |first2=Volker |chapter=Man made structure detection and verification of object recognition in images for the visually impaired |date=2013-06-06 |title=Proceedings of the 6th International Conference on Computer Vision / Computer Graphics Collaboration Techniques and Applications |chapter-url=https://dl.acm.org/doi/10.1145/2466715.2466732 |series=MIRAGE '13 |location=New York, NY, USA |publisher=Association for Computing Machinery |pages=1–8 |doi=10.1145/2466715.2466732 |isbn=978-1-4503-2023-8}}</ref> [[machine learning]] or [[computational biology]],<ref name="Kindermann-Snell80"/><ref>{{Cite journal|last1=Banf|first1=Michael|last2=Rhee|first2=Seung Y.|date=2017-02-01|title=Enhancing gene regulatory network inference through data integration with markov random fields|journal=Scientific Reports|volume=7|issue=1|pages=41174|doi=10.1038/srep41174|pmid=28145456|pmc=5286517|issn=2045-2322|bibcode=2017NatSR...741174B}}</ref> and [[information retrieval]].<ref>{{Cite conference | first1= Donald | last1 = Metzler | first2 = W.Bruce |last2=Croft| title=A Markov random field model for term dependencies | year = 2005 | conference = Proceedings of the 28th ACM SIGIR Conference| pages = 472–479 | publisher=ACM | location= Salvador, Brazil | doi=10.1145/1076034.1076115}}</ref> MRFs are used in image processing to generate textures as they can be used to generate flexible and stochastic image models. In image modelling, the task is to find a suitable intensity distribution of a given image, where suitability depends on the kind of task and MRFs are flexible enough to be used for image and texture synthesis, [[image compression]] and restoration, [[image segmentation]], 3D image inference from 2D images, [[image registration]], [[texture synthesis]], [[super-resolution]], [[Computer stereo vision|stereo matching]] and [[information retrieval]]. They can be used to solve various computer vision problems which can be posed as energy minimization problems or problems where different regions have to be distinguished using a set of discriminating features, within a Markov random field framework, to predict the category of the region.<ref>{{Cite journal |title = Automatic Identification of Window Regions on Indoor Point Clouds Using LiDAR and Cameras|last = Zhang & Zakhor|first = Richard & Avideh|date = 2014|journal = VIP Lab Publications|citeseerx = 10.1.1.649.303}}</ref> Markov random fields were a generalization over the Ising model and have, since then, been used widely in combinatorial optimizations and networks. == See also == {{Div col|colwidth=20em}} * [[Constraint composite graph]] * [[Graphical model]] * [[Dependency network (graphical model)]] * [[Hammersley–Clifford theorem]] * [[Hopfield network]] * [[Interacting particle system]] * [[Ising model]] * [[Log-linear analysis]] * [[Markov chain]] * [[Markov logic network]] * [[Maximum entropy method]] * [[Stochastic cellular automaton]] {{Div col end}} ==References== {{reflist}} {{Stochastic processes}} [[Category:Graphical models]] [[Category:Markov networks| ]]
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)
Pages transcluded onto the current version of this page
(
help
)
:
Template:Citation
(
edit
)
Template:Cite book
(
edit
)
Template:Cite conference
(
edit
)
Template:Cite journal
(
edit
)
Template:Cite web
(
edit
)
Template:Div col
(
edit
)
Template:Div col end
(
edit
)
Template:Explain
(
edit
)
Template:Main
(
edit
)
Template:Reflist
(
edit
)
Template:Short description
(
edit
)
Template:Stochastic processes
(
edit
)