Editing Principal component analysis (section)

=== Factor analysis ===
[[File:PCA_versus_Factor_Analysis.jpg|thumb|The above picture is an example of the difference between PCA and Factor Analysis. In the top diagram the "factor" (e.g., career path) represents the three observed variables (e.g., doctor, lawyer, teacher) whereas in the bottom diagram the observed variables (e.g., pre-school teacher, middle school teacher, high school teacher) are reduced into the component of interest (e.g., teacher).]]
Principal component analysis creates variables that are linear combinations of the original variables. The new variables have the property that the variables are all orthogonal. The PCA transformation can be helpful as a pre-processing step before clustering. PCA is a variance-focused approach seeking to reproduce the total variable variance, in which components reflect both common and unique variance of the variable. PCA is generally preferred for purposes of data reduction (that is, translating variable space into optimal factor space) but not when the goal is to detect the latent construct or factors.

[[Factor analysis]] is similar to principal component analysis, in that factor analysis also involves linear combinations of variables. Different from PCA, factor analysis is a correlation-focused approach seeking to reproduce the inter-correlations among variables, in which the factors "represent the common variance of variables, excluding unique variance".<ref>Timothy A. Brown. [https://books.google.com/books?id=JDb3BQAAQBAJ Confirmatory Factor Analysis for Applied Research Methodology in the social sciences]. Guilford Press, 2006</ref> In terms of the correlation matrix, this corresponds with focusing on explaining the off-diagonal terms (that is, shared co-variance), while PCA focuses on explaining the terms that sit on the diagonal. However, as a side result, when trying to reproduce the on-diagonal terms, PCA also tends to fit relatively well the off-diagonal correlations.<ref name="Jolliffe2002" />{{rp|158}} Results given by PCA and factor analysis are very similar in most situations, but this is not always the case, and there are some problems where the results are significantly different. Factor analysis is generally used when the research purpose is detecting data structure (that is, latent constructs or factors) or [[causal modeling]]. If the factor model is incorrectly formulated or the assumptions are not met, then factor analysis will give erroneous results.<ref>{{cite journal |last1=Meglen|first1=R.R. |title=Examining Large Databases: A Chemometric Approach Using Principal Component Analysis|journal=Journal of Chemometrics |volume=5 |issue=3|pages=163–179 |date=1991 |doi=10.1002/cem.1180050305 |s2cid=120886184 }}</ref>