Editing Expectation–maximization algorithm (section)

== Variants ==
A number of methods have been proposed to accelerate the sometimes slow convergence of the EM algorithm, such as those using [[conjugate gradient]] and modified [[Newton's method]]s (Newton–Raphson).<ref>{{cite journal |first1= Mortaza |last1=Jamshidian |first2=Robert I. |last2=Jennrich|title=Acceleration of the EM Algorithm by using Quasi-Newton Methods |year=1997 |journal=[[Journal of the Royal Statistical Society, Series B]] |volume=59 |issue=2 |pages=569–587 |doi=10.1111/1467-9868.00083 |mr=1452026 |s2cid=121966443 }}</ref> Also, EM can be used with constrained estimation methods.

''Parameter-expanded expectation maximization (PX-EM)'' algorithm often provides speed up by "us[ing] a `covariance adjustment' to correct the analysis of the M step, capitalising on extra information captured in the imputed complete data".<ref>{{cite journal |doi=10.1093/biomet/85.4.755 |title=Parameter expansion to accelerate EM: The PX-EM algorithm |journal=Biometrika |volume=85 |issue=4 |pages=755–770 |year=1998 |last1=Liu |first1=C |citeseerx=10.1.1.134.9617 }}</ref>

''Expectation conditional maximization (ECM)'' replaces each M step with a sequence of conditional maximization (CM) steps in which each parameter ''θ''<sub>''i''</sub> is maximized individually, conditionally on the other parameters remaining fixed.<ref>{{cite journal|last1=Meng  |first1= Xiao-Li |last2=Rubin |first2=Donald B. |s2cid= 40571416 |author-link2=Donald Rubin |title=Maximum likelihood estimation via the ECM algorithm: A general framework |year=1993 |journal=[[Biometrika]] |volume=80 |issue=2 |pages=267–278 |doi=10.1093/biomet/80.2.267 |mr=1243503}}</ref> Itself can be extended into the ''Expectation conditional maximization either (ECME)'' algorithm.<ref>{{cite journal |doi= 10.1093/biomet/81.4.633|jstor=2337067 |title=The ECME Algorithm: A Simple Extension of EM and ECM with Faster Monotone Convergence |journal=Biometrika |volume=81 |issue=4 |pages=633 |year=1994 |last1=Liu |first1=Chuanhai |last2=Rubin |first2=Donald B }}</ref>

This idea is further extended in ''generalized expectation maximization (GEM)'' algorithm, in which is sought only an increase in the objective function ''F'' for both the E step and M step as described in the [[#As a maximization–maximization procedure|As a maximization–maximization procedure]] section.<ref name="neal1999"/> GEM is further developed in a distributed environment and shows promising results.<ref>
{{cite journal
 |author1=Jiangtao Yin |author2=Yanfeng Zhang |author3=Lixin Gao |title=Accelerating Expectation–Maximization Algorithms with Frequent Updates
 |journal=Proceedings of the IEEE International Conference on Cluster Computing
 |year=2012
 |url=http://rio.ecs.umass.edu/mnilpub/papers/cluster2012-yin.pdf
}}
</ref>

It is also possible to consider the EM algorithm as a subclass of the '''[[MM algorithm|MM]]''' (Majorize/Minimize or Minorize/Maximize, depending on context) algorithm,<ref>Hunter DR and Lange K (2004), [http://www.stat.psu.edu/~dhunter/papers/mmtutorial.pdf A Tutorial on MM Algorithms], The American Statistician, 58: 30–37</ref> and therefore use any machinery developed in the more general case.

=== α-EM algorithm ===
The Q-function used in the EM algorithm is based on the log likelihood. Therefore, it is regarded as the log-EM algorithm. The use of the log likelihood can be generalized to that of the α-log likelihood ratio. Then, the α-log likelihood ratio of the observed data can be exactly expressed as equality by using the Q-function of the α-log likelihood ratio and the α-divergence. Obtaining this Q-function is a generalized E step. Its maximization is a generalized M step. This pair is called the α-EM algorithm<ref>
{{cite journal
|last=Matsuyama |first=Yasuo
|title=The α-EM algorithm: Surrogate likelihood maximization using α-logarithmic information measures
|journal=IEEE Transactions on Information Theory
|volume=49 | year=2003 |pages=692–706 |issue=3
|doi=10.1109/TIT.2002.808105
}}
</ref>
which contains the log-EM algorithm as its subclass. Thus, the α-EM algorithm by [[Yasuo Matsuyama]] is an exact generalization of the log-EM algorithm. No computation of gradient or Hessian matrix is needed. The α-EM shows faster convergence than the log-EM algorithm by choosing an appropriate α. The α-EM algorithm leads to a faster version of the Hidden Markov model estimation algorithm α-HMM.
<ref>
{{cite journal
|last=Matsuyama |first=Yasuo
|title=Hidden Markov model estimation based on alpha-EM algorithm: Discrete and continuous alpha-HMMs
|journal=International Joint Conference on Neural Networks
| year=2011 |pages=808–816
}}
</ref>