Editing Factor analysis (section)

===Geometric interpretation===
[[File:FactorPlot.svg|thumb|upright=1.5|Geometric interpretation of Factor Analysis parameters for 3 respondents to question "a". The "answer" is represented by the unit vector <math>\mathbf{z}_a</math>, which is projected onto a plane defined by two orthonormal vectors <math>\mathbf{F}_1</math> and <math>\mathbf{F}_2</math>. The projection vector is <math>\hat{\mathbf{z}}_a</math> and the error <math>\boldsymbol{\varepsilon}_a</math> is perpendicular to the plane, so that <math>\mathbf{z}_a=\hat{\mathbf{z}}_a+\boldsymbol{\varepsilon}_a</math>. The projection vector <math>\hat{\mathbf{z}}_a</math> may be represented in terms of the factor vectors as <math>\hat{\mathbf{z}}_a=\ell_{a1}\mathbf{F}_1+\ell_{a2}\mathbf{F}_2</math>. The square of the length of the projection vector is the communality: <math>||\hat{\mathbf{z}}_a||^2=h^2_a</math>. If another data vector <math>\mathbf{z}_b</math> were plotted, the cosine of the angle between <math>\mathbf{z}_a</math> and <math>\mathbf{z}_b</math> would be <math>r_{ab}</math> : the <math>(a,b)</math>-entry in the correlation matrix. (Adapted from Harman Fig. 4.3)<ref name="Harman"/>]]

The parameters and variables of factor analysis can be given a geometrical interpretation. The data (<math>z_{ai}</math>), the factors (<math>F_{pi}</math>) and the errors (<math>\varepsilon_{ai}</math>) can be viewed as vectors in an <math>N</math>-dimensional Euclidean space (sample space), represented as <math>\mathbf{z}_a</math>, <math>\mathbf{F}_p</math> and <math>\boldsymbol{\varepsilon}_a</math> respectively. Since the data are standardized, the data vectors are of unit length (<math>||\mathbf{z}_a||=1</math>). The factor vectors define an <math>k</math>-dimensional linear subspace (i.e. a hyperplane) in this space, upon which the data vectors are projected orthogonally. This follows from the model equation 
:<math>\mathbf{z}_a=\sum_p \ell_{ap} \mathbf{F}_p+\boldsymbol{\varepsilon}_a</math>
and the independence of the factors and the errors: <math>\mathbf{F}_p\cdot\boldsymbol{\varepsilon}_a=0</math>. In the above example, the hyperplane is just a 2-dimensional plane defined by the two factor vectors. The projection of the data vectors onto the hyperplane is given by 
:<math>\hat{\mathbf{z}}_a=\sum_p \ell_{ap}\mathbf{F}_p</math>
and the errors are vectors from that projected point to the data point and are perpendicular to the hyperplane. The goal of factor analysis is to find a hyperplane which is a "best fit" to the data in some sense, so it doesn't matter how the factor vectors which define this hyperplane are chosen, as long as they are independent and lie in the hyperplane. We are free to specify them as both orthogonal and normal (<math>\mathbf{F}_p\cdot \mathbf{F}_q=\delta_{pq}</math>) with no loss of generality. After a suitable set of factors are found, they may also be arbitrarily rotated within the hyperplane, so that any rotation of the factor vectors will define the same hyperplane, and also be a solution. As a result, in the above example, in which the fitting hyperplane is two dimensional, if we do not know beforehand that the two types of intelligence are uncorrelated, then we cannot interpret the two factors as the two different types of intelligence. Even if they are uncorrelated, we cannot tell which factor corresponds to verbal intelligence and which corresponds to mathematical intelligence, or whether the factors are linear combinations of both, without an outside argument.

The data vectors <math>\mathbf{z}_a</math> have unit length. The entries of the correlation matrix for the data are given by <math>r_{ab}=\mathbf{z}_a\cdot\mathbf{z}_b</math>. The correlation matrix can be geometrically interpreted as the cosine of the angle between the two data vectors <math>\mathbf{z}_a</math> and <math>\mathbf{z}_b</math>. The diagonal elements will clearly be <math>1</math>s and the off diagonal elements will have absolute values less than or equal to unity. The "reduced correlation matrix" is defined as 
:<math>\hat{r}_{ab}=\hat{\mathbf{z}}_a\cdot\hat{\mathbf{z}}_b</math>.

The goal of factor analysis is to choose the fitting hyperplane such that the reduced correlation matrix reproduces the correlation matrix as nearly as possible, except for the diagonal elements of the correlation matrix which are known to have unit value.  In other words, the goal is to reproduce as accurately as possible the cross-correlations in the data. Specifically, for the fitting hyperplane, the mean square error in the off-diagonal components 
:<math>\varepsilon^2=\sum_{a\ne b} \left(r_{ab}-\hat{r}_{ab}\right)^2</math>

is to be minimized, and this is accomplished by minimizing it with respect to a set of orthonormal factor vectors. It can be seen that 
:<math>
r_{ab}-\hat{r}_{ab}= \boldsymbol{\varepsilon}_a\cdot\boldsymbol{\varepsilon}_b
</math>

The term on the right is just the covariance of the errors. In the model, the error covariance is stated to be a diagonal matrix and so the above minimization problem will in fact yield a "best fit" to the model: It will yield a sample estimate of the error covariance which has its off-diagonal components minimized in the mean square sense. It can be seen that since the <math>\hat{z}_a</math> are orthogonal projections of the data vectors, their length will be less than or equal to the length of the projected data vector, which is unity. The square of these lengths are just the diagonal elements of the reduced correlation matrix. These diagonal elements of the reduced correlation matrix are known as "communalities":

:<math>
{h_a}^2=||\hat{\mathbf{z}}_a||^2= \sum_p {\ell_{ap}}^2
</math>

Large values of the communalities will indicate that the fitting hyperplane is rather accurately reproducing the correlation matrix. The mean values of the factors must also be constrained to be zero, from which it follows that the mean values of the errors will also be zero.