Editing Expectation–maximization algorithm (section)

==== E step ====
Given our current estimate of the parameters ''θ''<sup>(''t'')</sup>, the conditional distribution of the ''Z''<sub>''i''</sub> is determined by [[Bayes' theorem]] to be the proportional height of the normal [[probability density function|density]] weighted by ''τ'':
: <math>T_{j,i}^{(t)} := \operatorname{P}(Z_i=j \mid X_i=\mathbf{x}_i ;\theta^{(t)}) = \frac{\tau_j^{(t)} \ f(\mathbf{x}_i;\boldsymbol{\mu}_j^{(t)},\Sigma_j^{(t)})}{\tau_1^{(t)} \ f(\mathbf{x}_i;\boldsymbol{\mu}_1^{(t)},\Sigma_1^{(t)}) + \tau_2^{(t)} \ f(\mathbf{x}_i;\boldsymbol{\mu}_2^{(t)},\Sigma_2^{(t)})}.</math>

These are called the "membership probabilities", which are normally considered the output of the E step (although this is not the Q function of below).

This E step corresponds with setting up this function for Q:
: <math>\begin{align}Q(\theta\mid\theta^{(t)})
&= \operatorname{E}_{\mathbf{Z}\mid\mathbf{X}=\mathbf{x};\mathbf{\theta}^{(t)}} [\log L(\theta;\mathbf{x},\mathbf{Z}) ] \\
&= \operatorname{E}_{\mathbf{Z}\mid\mathbf{X}=\mathbf{x};\mathbf{\theta}^{(t)}} [\log \prod_{i=1}^{n}L(\theta;\mathbf{x}_i,Z_i) ] \\
&= \operatorname{E}_{\mathbf{Z}\mid\mathbf{X}=\mathbf{x};\mathbf{\theta}^{(t)}} [\sum_{i=1}^n \log L(\theta;\mathbf{x}_i,Z_i) ] \\
&= \sum_{i=1}^n\operatorname{E}_{Z_i\mid X_i=x_i;\mathbf{\theta}^{(t)}} [\log L(\theta;\mathbf{x}_i,Z_i) ] \\
&= \sum_{i=1}^n \sum_{j=1}^2 P(Z_i =j \mid X_i = \mathbf{x}_i; \theta^{(t)}) \log L(\theta_j;\mathbf{x}_i,j) \\
&= \sum_{i=1}^n \sum_{j=1}^2 T_{j,i}^{(t)} \big[ \log \tau_j  -\tfrac{1}{2} \log |\Sigma_j| -\tfrac{1}{2}(\mathbf{x}_i-\boldsymbol{\mu}_j)^\top\Sigma_j^{-1} (\mathbf{x}_i-\boldsymbol{\mu}_j) -\tfrac{d}{2} \log(2\pi) \big].
\end{align}</math>
The expectation of <math>\log L(\theta;\mathbf{x}_i,Z_i)</math> inside the sum is taken with respect to the probability density function <math>P(Z_i \mid X_i = \mathbf{x}_i; \theta^{(t)})</math>, which might be different for each  <math>\mathbf{x}_i</math> of the training set. Everything in the E step is known before the step is taken except <math>T_{j,i}</math>, which is computed according to the equation at the beginning of the E step section.

This full conditional expectation does not need to be calculated in one step, because ''τ'' and '''μ'''/'''Σ''' appear in separate linear terms and can thus be maximized independently.