Editing Expectation–maximization algorithm (section)

== As a maximization–maximization procedure ==
The EM algorithm can be viewed as two alternating maximization steps, that is, as an example of [[coordinate descent]].<ref name="neal1999">{{cite book|last1=Neal |first1=Radford |last2=Hinton |first2=Geoffrey |author-link2=Geoffrey Hinton |contribution=A view of the EM algorithm that justifies incremental, sparse, and other variants |title=Learning in Graphical Models |editor=Michael I. Jordan |editor-link=Michael I. Jordan |pages= 355–368 |publisher= MIT Press |location=Cambridge, MA |year=1999 |isbn=978-0-262-60032-3 |url=http://ftp.cs.toronto.edu/pub/radford/emk.pdf |access-date=2009-03-22}}</ref><ref name="hastie2001">{{cite book|last1=Hastie |first1= Trevor|author-link1=Trevor Hastie|last2=Tibshirani|first2=Robert|author-link2=Robert Tibshirani|last3=Friedman|first3=Jerome |year=2001 |title=The Elements of Statistical Learning |url=https://archive.org/details/elementsstatisti00thas_842 |url-access=limited |isbn=978-0-387-95284-0 |publisher=Springer |location=New York |chapter=8.5 The EM algorithm |pages=[https://archive.org/details/elementsstatisti00thas_842/page/n237 236]–243}}</ref> Consider the function:
:<math> F(q,\theta) := \operatorname{E}_q [ \log L (\theta ; x,Z) ] + H(q), </math> 
where ''q'' is an arbitrary probability distribution over the unobserved data ''z'' and ''H(q)'' is the [[Entropy (information theory)|entropy]] of the distribution ''q''. This function can be written as
:<math> F(q,\theta) = -D_{\mathrm{KL}}\big(q \parallel p_{Z\mid X}(\cdot\mid x;\theta ) \big) + \log L(\theta;x), </math>
where  <math>p_{Z\mid X}(\cdot\mid x;\theta )</math> is the conditional distribution of the unobserved data given the observed data <math>x</math> and <math>D_{KL}</math> is the [[Kullback–Leibler divergence]].

Then the steps in the EM algorithm may be viewed as:
:''Expectation step'': Choose <math>q</math> to maximize <math>F</math>:
::<math> q^{(t)} = \operatorname{arg\,max}_q \ F(q,\theta^{(t)}) </math>
:''Maximization step'': Choose <math>\theta</math> to maximize <math>F</math>:
::<math> \theta^{(t+1)} = \operatorname{arg\,max}_\theta \ F(q^{(t)},\theta) </math>