Editing Semidefinite embedding (section)

==Optimization formulation==

Let <math>X \,\!</math> be the original input and <math>Y\,\!</math> be the embedding. If <math>i,j\,\!</math> are two neighbors, then the local isometry constraint that needs to be satisfied is:<ref>{{Harvnb|Weinberger, Sha and Saul|2004a|loc=page 3, equation 8}}</ref><ref>{{Harvnb|Weinberger and Saul|2004b|loc=page 3, equation 2}}</ref><ref>{{Harvnb|Weinberger and Saul|2006|loc=page 4, equation 2}}</ref>

:<math>|X_{i}-X_{j}|^{2}=|Y_{i}-Y_{j}|^{2}\,\!</math>

Let <math>G, K\,\!</math> be the [[Gramian matrix|Gram matrices]] of <math> X \,\!</math> and <math> Y \,\!</math> (i.e.: <math>G_{ij}=X_i \cdot X_j,K_{ij}=Y_i \cdot Y_j \,\!</math>). We can express the above constraint for every neighbor points <math>i,j\,\!</math> in term of <math>G, K\,\!</math>:<ref>{{Harvnb|Weinberger, Sha and Saul|2004a|loc=page 3, equation 9}}</ref><ref>{{Harvnb|Weinberger and Saul|2004b|loc=page 3, equation 3}}</ref>

:<math>G_{ii}+G_{jj}-G_{ij}-G_{ji}=K_{ii}+K_{jj}-K_{ij}-K_{ji}\,\!</math>

In addition, we also want to constrain the embedding <math> Y \,\!</math> to center at the origin:<ref>{{Harvnb|Weinberger, Sha and Saul|2004a|loc=page 3, equation 6}}</ref><ref>{{Harvnb|Weinberger and Saul|2004b|loc=page 3, equation 5}}</ref><ref>{{Harvnb|Weinberger and Saul|2006|loc=page 5, equation 8}}</ref>

<math>0 = |\sum_{i}Y_{i}|^2\Leftrightarrow(\sum_{i}Y_{i}) \cdot (\sum_{i}Y_{i})\Leftrightarrow\sum_{i,j}Y_{i} \cdot Y_{j}\Leftrightarrow\sum_{i,j}K_{ij}</math>

As described above, except the distances of neighbor points are preserved, the algorithm aims to maximize the pairwise distance of every pair of points. The objective function to be maximized is:<ref>{{Harvnb|Weinberger, Sha and Saul|2004a|loc=page 4, equation 10}}</ref><ref>{{Harvnb|Weinberger and Saul|2004b|loc=page 4, equation 6}}</ref><ref>{{Harvnb|Weinberger and Saul|2006|loc=page 5, equation 4}}</ref>

<math>T(Y)=\dfrac{1}{2N}\sum_{i,j}|Y_{i}-Y_{j}|^{2}</math>

Intuitively, maximizing the function above is equivalent to pulling the points as far away from each other as possible and therefore "unfold" the manifold. The local isometry constraint <ref>{{Harvnb|Weinberger and Saul|2004b|loc=page 4, equation 7}}</ref>

Let <math>\tau = max \{\eta_{ij}|Y_{i}-Y_{j}|^2\} \,\!</math> where 
<math>\eta_{ij} := \begin{cases}
         1 & \mbox{if}\ i \mbox{ is a neighbour of } j \\
          0 & \mbox{otherwise} .
\end{cases}</math>

prevents the objective function from diverging (going to infinity).

Since the graph has N points, the distance between any two points <math>|Y_{i}-Y_{j}|^2 \leq N \tau \,\!</math>. We can then bound the objective function as follows:<ref>{{Harvnb|Weinberger and Saul|2004b|loc=page 4, equation 8}}</ref><ref>{{Harvnb|Weinberger and Saul|2006|loc=page 5, equation 6}}</ref>

:<math>T(Y)=\dfrac{1}{2N}\sum_{i,j}|Y_{i}-Y_{j}|^{2} \leq \dfrac{1}{2N}\sum_{i,j}(N\tau)^2 = \dfrac{N^3\tau^2}{2} \,\!</math>

The objective function can be rewritten purely in the form of the Gram matrix:<ref>{{Harvnb|Weinberger, Sha and Saul|2004a|loc=page 4, equation 11}}</ref><ref>{{Harvnb|Weinberger and Saul|2004b|loc=page 4, equation 9}}</ref><ref>{{Harvnb|Weinberger and Saul|2006|loc=page 6, equations 10 to 13}}</ref>

:<math> 
\begin{align}
  T(Y) &{}=  \dfrac{1}{2N}\sum_{i,j}|Y_{i}-Y_{j}|^{2} \\
          &{}= \dfrac{1}{2N}\sum_{i,j}(Y_{i}^2+Y_{j}^2-Y_{i} \cdot Y_{j} - Y_{j} \cdot Y_{i})\\
          &{}= \dfrac{1}{2N}(\sum_{i,j}Y_{i}^2+\sum_{i,j}Y_{j}^2-\sum_{i,j}Y_{i} \cdot Y_{j} -\sum_{i,j}Y_{j} \cdot Y_{i})\\  
          &{}= \dfrac{1}{2N}(\sum_{i,j}Y_{i}^2+\sum_{i,j}Y_{j}^2-0 -0)\\
          &{}= \dfrac{1}{N}(\sum_{i}Y_{i}^2)=\dfrac{1}{N}(Tr(K))\\
\end{align}
\,\!</math>

Finally, the optimization can be formulated as:<ref>{{Harvnb|Weinberger, Sha and Saul|2004a|loc=page 4, section 3.3}}</ref><ref>{{Harvnb|Weinberger and Saul|2004b|loc=page 4, equation 9}}</ref><ref>{{Harvnb|Weinberger and Saul|2006|loc=page 6, equations 10 to 13}}</ref>

<math>
 \begin{align}
& \text{Maximize}   && Tr(\mathbf{K})\\
& \text{subject to} && \mathbf{K} \succeq 0, \sum_{ij}\mathbf{K}_{ij} = 0 \\
& \text{and} && G_{ii}+G_{jj}-G_{ij}-G_{ji}=K_{ii}+K_{jj}-K_{ij}-K_{ji}, \forall i, j \mbox{  where } \eta_{ij} = 1,
\end{align} 
</math>

After the Gram matrix <math>K \,\!</math> is learned by semidefinite programming, the output <math>Y \,\!</math> can be obtained via [[Cholesky decomposition]]. 

In particular, the Gram matrix can be written as <math> K_{ij}=\sum_{\alpha = 1}^{N}(\lambda_{\alpha } V_{\alpha i} V_{\alpha j}) \,\!</math> where <math> V_{\alpha i} \,\!</math> is the i-th element of eigenvector <math> V_{\alpha} \,\!</math> of the eigenvalue <math> \lambda_{\alpha } \,\!</math>.<ref>{{Harvnb|Weinberger and Saul|2004b|loc=page 4, equation 10}}</ref><ref>{{Harvnb|Weinberger and Saul|2006|loc=page 7, equations 14}}</ref>

It follows that the <math> \alpha \,\!</math>-th element of the output <math> Y_i \,\!</math> is <math> \sqrt{\lambda_{\alpha }} V_{\alpha i} \,\!</math>.<ref>{{Harvnb|Weinberger and Saul|2004b|loc=page 4, equation 11}}</ref><ref>{{Harvnb|Weinberger and Saul|2006|loc=page 7, equations 15}}</ref>