Editing Factor analysis (section)

==Exploratory factor analysis (EFA) versus principal components analysis (PCA)==
{{see also|Principal component analysis|Exploratory factor analysis}}

Factor analysis is related to [[principal component analysis]] (PCA), but the two are not identical.<ref name="Bartholomew2008">{{cite book |last1=Bartholomew |first1=D.J. |last2=Steele |first2=F. |last3=Galbraith |first3=J. |last4=Moustaki |first4=I. |title=Analysis of Multivariate Social Science Data |publisher=Taylor & Francis |year=2008 |isbn=978-1584889601 |edition=2nd |series=Statistics in the Social and Behavioral Sciences Series}}</ref> There has been significant controversy in the field over differences between the two techniques. PCA can be considered as a more basic version of [[exploratory factor analysis]] (EFA) that was developed in the early days prior to the advent of high-speed computers. Both PCA and factor analysis aim to reduce the dimensionality of a set of data, but the approaches taken to do so are different for the two techniques. Factor analysis is clearly designed with the objective to identify certain unobservable factors from the observed variables, whereas PCA does not directly address this objective; at best, PCA provides an approximation to the required factors.<ref name="Principal Component Analysis">Jolliffe I.T. ''Principal Component Analysis'', Series: Springer Series in Statistics, 2nd ed., Springer, NY, 2002, XXIX, 487 p. 28 illus. {{isbn|978-0-387-95442-4}}</ref> From the point of view of exploratory analysis, the [[eigenvalues]] of PCA are inflated component loadings, i.e., contaminated with error variance.<ref>Cattell, R. B. (1952). ''Factor analysis''. New York: Harper.</ref><ref>Fruchter, B. (1954). ''Introduction to Factor Analysis''. Van Nostrand.</ref><ref>Cattell, R. B. (1978). ''Use of Factor Analysis in Behavioral and Life Sciences''. New York: Plenum.</ref><ref>Child, D. (2006). ''The Essentials of Factor Analysis, 3rd edition''. Bloomsbury Academic Press.</ref><ref>Gorsuch, R. L. (1983). ''Factor Analysis, 2nd edition''. Hillsdale, NJ: Erlbaum.</ref><ref>McDonald, R. P. (1985). ''Factor Analysis and Related Methods''. Hillsdale, NJ: Erlbaum.</ref>

Whilst [[Exploratory factor analysis|EFA]] and [[Principal component analysis|PCA]] are treated as synonymous techniques in some fields of statistics, this has been criticised.<ref name=Fabrigar>{{cite web|last=Fabrigar|title=Evaluating the use of exploratory factor analysis in psychological research.|year=1999|url=http://www.statpower.net/Content/312/Handout/Fabrigar1999.pdf|publisher=Psychological Methods|display-authors=etal}}</ref><ref name=Suhr>{{cite web|last=Suhr|first=Diane|year=2009|title=Principal component analysis vs. exploratory factor analysis|url=http://www2.sas.com/proceedings/sugi30/203-30.pdf|publisher=SUGI 30 Proceedings|access-date=5 April 2012}}</ref> Factor analysis "deals with ''the assumption of an underlying causal structure'': [it] assumes that the covariation in the observed variables is due to the presence of one or more latent variables (factors) that exert causal influence on these observed variables".<ref name=Sas>{{cite web|title=Principal Components Analysis|url=http://support.sas.com/publishing/pubcat/chaps/55129.pdf|work=SAS Support Textbook|author=SAS Statistics}}</ref> In contrast, PCA neither assumes nor depends on such an underlying causal relationship. Researchers have argued that the distinctions between the two techniques may mean that there are objective benefits for preferring one over the other based on the analytic goal. If the factor model is incorrectly formulated or the assumptions are not met, then factor analysis will give erroneous results. Factor analysis has been used successfully where adequate understanding of the system permits good initial model formulations. PCA employs a mathematical transformation to the original data with no assumptions about the form of the covariance matrix. The objective of PCA is to determine linear combinations of the original variables and select a few that can be used to summarize the data set without losing much information.<ref>{{cite journal |last1=Meglen|first1=R.R. |title=Examining Large Databases: A Chemometric Approach Using Principal Component Analysis|journal=Journal of Chemometrics |volume=5 |issue=3|pages=163–179 |date=1991 |doi=10.1002/cem.1180050305 |s2cid=120886184 }}</ref>

=== Arguments contrasting PCA and EFA ===
Fabrigar et al. (1999)<ref name=Fabrigar /> address a number of reasons used to suggest that PCA is not equivalent to factor analysis:

# It is sometimes suggested that PCA is computationally quicker and requires fewer resources than factor analysis. Fabrigar et al. suggest that readily available computer resources have rendered this practical concern irrelevant.
# PCA and factor analysis can produce similar results. This point is also addressed by Fabrigar et al.; in certain cases, whereby the communalities are low (e.g. 0.4), the two techniques produce divergent results. In fact, Fabrigar et al. argue that in cases where the data correspond to assumptions of the common factor model, the results of PCA are inaccurate results.
# There are certain cases where factor analysis leads to 'Heywood cases'. These encompass situations whereby 100% or more of the [[variance]] in a measured variable is estimated to be accounted for by the model. Fabrigar et al. suggest that these cases are actually informative to the researcher, indicating an incorrectly specified model or a violation of the common factor model. The lack of Heywood cases in the PCA approach may mean that such issues pass unnoticed.
# Researchers gain extra information from a PCA approach, such as an individual's score on a certain component; such information is not yielded from factor analysis. However, as Fabrigar et al. contend, the typical aim of factor analysis – i.e. to determine the factors accounting for the structure of the [[Correlation and dependence|correlations]] between measured variables – does not require knowledge of factor scores and thus this advantage is negated. It is also possible to compute factor scores from a factor analysis.

=== Variance versus covariance ===
Factor analysis takes into account the [[random error]] that is inherent in measurement, whereas PCA fails to do so. This point is exemplified by Brown (2009),<ref name=Brown>{{cite web|last=Brown|first=J. D.|title=Principal components analysis and exploratory factor analysis – Definitions, differences and choices.|date=January 2009|url=http://jalt.org/test/PDF/Brown29.pdf|publisher=Shiken: JALT Testing & Evaluation SIG Newsletter|access-date=16 April 2012}}</ref> who indicated that, in respect to the correlation matrices involved in the calculations:

{{Blockquote|"In PCA, 1.00s are put in the diagonal meaning that all of the variance in the matrix is to be accounted for (including variance unique to each variable, variance common among variables, and error variance). That would, therefore, by definition, include all of the variance in the variables. In contrast, in EFA, the communalities are put in the diagonal meaning that only the variance shared with other variables is to be accounted for (excluding variance unique to each variable and error variance). That would, therefore, by definition, include only variance that is common among the variables."|Brown (2009)|Principal components analysis and exploratory factor analysis – Definitions, differences and choices}}

For this reason, Brown (2009) recommends using factor analysis when theoretical ideas about relationships between variables exist, whereas PCA should be used if the goal of the researcher is to explore patterns in their data.

===Differences in procedure and results===
The differences between PCA and factor analysis (FA) are further illustrated by Suhr (2009):<ref name=Suhr />
* PCA results in principal components that account for a maximal amount of variance for observed variables; FA accounts for ''common'' variance in the data.
* PCA inserts ones on the diagonals of the correlation matrix; FA adjusts the diagonals of the correlation matrix with the unique factors.
* PCA minimizes the sum of squared perpendicular distance to the component axis; FA estimates factors that influence responses on observed variables.
* The component scores in PCA represent a linear combination of the observed variables weighted by [[Eigenvalues and eigenvectors|eigenvectors]]; the observed variables in FA are linear combinations of the underlying and unique factors.
* In PCA, the components yielded are uninterpretable, i.e. they do not represent underlying ‘constructs’; in FA, the underlying constructs can be labelled and readily interpreted, given an accurate model specification.