Editing Factor analysis (section)

==Practical implementation==
{{More citations needed section|date=April 2012}}

===Types of factor analysis===
====Exploratory factor analysis====
{{broader|Exploratory factor analysis}}
Exploratory factor analysis (EFA) is used to identify complex interrelationships among items and group items that are part of unified concepts.<ref name=Polit>{{cite book |author=Polit DF Beck CT |title=Nursing Research: Generating and Assessing Evidence for Nursing Practice, 9th ed. |year=2012 |publisher=Wolters Klower Health, Lippincott Williams & Wilkins |location=Philadelphia, USA}}</ref>  The researcher makes no ''a priori'' assumptions about relationships among factors.<ref name=Polit/>

====Confirmatory factor analysis====
{{broader|Confirmatory factor analysis}}
Confirmatory factor analysis (CFA) is a more complex approach that tests the hypothesis that the items are associated with specific factors.<ref name=Polit/> CFA uses [[structural equation modeling]] to test a measurement model whereby loading on the factors allows for evaluation of relationships between observed variables and unobserved variables.<ref name=Polit/>  Structural equation modeling approaches can accommodate measurement error and are less restrictive than [[least-squares estimation]].<ref name=Polit/>  Hypothesized models are tested against actual data, and the analysis would demonstrate loadings of observed variables on the latent variables (factors), as well as the correlation between the latent variables.<ref name=Polit/>

===Types of factor extraction===
[[Principal component analysis]] (PCA) is a widely used method for factor extraction, which is the first phase of EFA.<ref name=Polit/> Factor weights are computed to extract the maximum possible variance, with successive factoring continuing until there is no further meaningful variance left.<ref name=Polit/> The factor model must then be rotated for analysis.<ref name=Polit/>

Canonical factor analysis, also called Rao's canonical factoring, is a different method of computing the same model as PCA, which uses the principal axis method. Canonical factor analysis seeks factors that have the highest canonical correlation with the observed variables. Canonical factor analysis is unaffected by arbitrary rescaling of the data.

Common factor analysis, also called [[principal factor analysis]] (PFA) or principal axis factoring (PAF), seeks the fewest factors which can account for the common variance (correlation) of a set of variables.

Image factoring is based on the [[correlation matrix]] of predicted variables rather than actual variables, where each variable is predicted from the others using [[multiple regression]].

Alpha factoring is based on maximizing the reliability of factors, assuming variables are randomly sampled from a universe of variables. All other methods assume cases to be sampled and variables fixed.

Factor regression model is a combinatorial model of factor model and regression model; or alternatively, it can be viewed as the hybrid factor model,<ref name="meng2011">{{cite journal|last=Meng |first=J. |title=Uncover cooperative gene regulations by microRNAs and transcription factors in glioblastoma using a nonnegative hybrid factor model |journal=International Conference on Acoustics, Speech and Signal Processing |year=2011 |url=http://www.cmsworldwide.com/ICASSP2011/Papers/ViewPapers.asp?PaperNum=4439 |url-status=dead |archive-url=https://web.archive.org/web/20111123144133/http://www.cmsworldwide.com/ICASSP2011/Papers/ViewPapers.asp?PaperNum=4439 |archive-date=2011-11-23 }}</ref> whose factors are partially known.

===Terminology===
{{glossary}}
{{term|Factor loadings}}
{{defn|1=Communality is the square of the standardized outer loading of an item. Analogous to [[Pearson product-moment correlation coefficient|Pearson's r]]-squared, the squared factor loading is the percent of variance in that indicator variable explained by the factor. To get the percent of variance in all the variables accounted for by each factor, add the sum of the squared factor loadings for that factor (column) and divide by the number of variables. (The number of variables equals the sum of their variances as the variance of a standardized variable is 1.) This is the same as dividing the factor's [[eigenvalue]] by the number of variables. {{pb}}When interpreting, by one rule of thumb in confirmatory factor analysis, factor loadings should be .7 or higher to confirm that independent variables identified a priori are represented by a particular factor, on the rationale that the .7 level corresponds to about half of the variance in the indicator being explained by the factor. However, the .7 standard is a high one and real-life data may well not meet this criterion, which is why some researchers, particularly for exploratory purposes, will use a lower level such as .4 for the central factor and .25 for other factors. In any event, factor loadings must be interpreted in the light of theory, not by arbitrary cutoff levels. {{pb}}In [[Angle#Types of angles|oblique]] rotation, one may examine both a pattern matrix and a structure matrix. The structure matrix is simply the factor loading matrix as in orthogonal rotation, representing the variance in a measured variable explained by a factor on both a unique and common contributions basis. The pattern matrix, in contrast, contains [[coefficient]]s which just represent unique contributions. The more factors, the lower the pattern coefficients as a rule since there will be more common contributions to variance explained. For oblique rotation, the researcher looks at both the structure and pattern coefficients when attributing a label to a factor. Principles of oblique rotation can be derived from both cross entropy and its dual entropy.<ref>{{cite journal | last1=Liou | first1=C.-Y. | last2=Musicus | first2=B.R. | title=Cross Entropy Approximation of Structured Gaussian Covariance Matrices |journal=IEEE Transactions on Signal Processing |volume=56 |issue=7 |pages=3362–3367 |year=2008 |doi=10.1109/TSP.2008.917878 | bibcode=2008ITSP...56.3362L | s2cid=15255630 | url=http://ntur.lib.ntu.edu.tw/bitstream/246246/155199/1/23.pdf }}</ref>}}
{{term|Communality}}
{{defn|The sum of the squared factor loadings for all factors for a given variable (row) is the variance in that variable accounted for by all the factors. The communality measures the percent of variance in a given variable explained by all the factors jointly and may be interpreted as the reliability of the indicator in the context of the factors being posited.}}
{{term|Spurious solutions}}
{{defn|If the communality exceeds 1.0, there is a spurious solution, which may reflect too small a sample or the choice to extract too many or too few factors.}}
{{term|Uniqueness of a variable}}
{{defn|The variability of a variable minus its communality.}}
{{term|Eigenvalues/characteristic roots}}
{{defn|Eigenvalues measure the amount of variation in the total sample accounted for by each factor. The ratio of eigenvalues is the ratio of explanatory importance of the factors with respect to the variables. If a factor has a low eigenvalue, then it is contributing little to the explanation of variances in the variables and may be ignored as less important than the factors with higher eigenvalues.}}
{{term|Extraction sums of squared loadings}}
{{defn|Initial eigenvalues and eigenvalues after extraction (listed by SPSS as "Extraction Sums of Squared Loadings") are the same for PCA extraction, but for other extraction methods, eigenvalues after extraction will be lower than their initial counterparts. SPSS also prints "Rotation Sums of Squared Loadings" and even for PCA, these eigenvalues will differ from initial and extraction eigenvalues, though their total will be the same.}}
{{term|Factor scores}}
{{term|Component scores (in PCA)|multi=yes}}
{{defn|1={{ghat|Explained from PCA perspective, not from Factor Analysis perspective.}} The scores of each case (row) on each factor (column). To compute the factor score for a given case for a given factor, one takes the case's standardized score on each variable, multiplies by the corresponding loadings of the variable for the given factor, and sums these products. Computing factor scores allows one to look for factor outliers. Also, factor scores may be used as variables in subsequent modeling.}}
{{glossary end}}

===Criteria for determining the number of factors===
Researchers wish to avoid such subjective or arbitrary criteria for factor retention as "it made sense to me". A number of objective methods have been developed to solve this problem, allowing users to determine an appropriate range of solutions to investigate.<ref name="Zwick1986">{{cite journal |last1=Zwick |first1=William R. |last2=Velicer |first2=Wayne F. |title=Comparison of five rules for determining the number of components to retain. |journal=Psychological Bulletin |date=1986 |volume=99 |issue=3 |pages=432–442 |doi=10.1037/0033-2909.99.3.432}}</ref> However these different methods often disagree with one another as to the number of factors that ought to be retained. For instance, the [[parallel analysis]] may suggest 5 factors while Velicer's MAP suggests 6, so the researcher may request both 5 and 6-factor solutions and discuss each in terms of their relation to external data and theory.

====Modern criteria====
[[Horn's parallel analysis]] (PA):<ref name="Horn1965">{{cite journal |last1=Horn |first1=John L. |title=A rationale and test for the number of factors in factor analysis |journal=Psychometrika |date=June 1965 |volume=30 |issue=2 |pages=179–185 |doi=10.1007/BF02289447|pmid=14306381 |s2cid=19663974 }}</ref> A Monte-Carlo based simulation method that compares the observed eigenvalues with those obtained from uncorrelated normal variables. A factor or component is retained if the associated eigenvalue is bigger than the 95th percentile of the distribution of eigenvalues derived from the random data. PA is among the more commonly recommended rules for determining the number of components to retain,<ref name="Zwick1986" /><ref>{{Cite arXiv|last=Dobriban|first=Edgar|date=2017-10-02|title=Permutation methods for factor analysis and PCA|class=math.ST|language=en|eprint=1710.00479v2}}</ref> but many programs fail to include this option (a notable exception being [[R (programming language)|R]]).<ref>* {{cite journal | last1 = Ledesma | first1 = R.D. | last2 = Valero-Mora | first2 = P. | year = 2007 | title = Determining the Number of Factors to Retain in EFA: An easy-to-use computer program for carrying out Parallel Analysis | url = http://pareonline.net/getvn.asp?v=12&n=2 | journal = Practical Assessment Research & Evaluation | volume = 12 | issue = 2| pages = 1–11 }}</ref> However, [[Anton Formann|Formann]] provided both theoretical and empirical evidence that its application might not be appropriate in many cases since its performance is considerably influenced by [[sample size]], [[Item response theory#The item response function|item discrimination]], and type of [[correlation coefficient]].<ref>Tran, U. S., & Formann, A. K. (2009). Performance of parallel analysis in retrieving unidimensionality in the presence of binary data. ''Educational and Psychological Measurement, 69,'' 50-61.</ref>

Velicer's (1976) MAP test<ref name=Velicer>{{cite journal|last=Velicer|first=W.F.|title=Determining the number of components from the matrix of partial correlations|journal=Psychometrika|year=1976|volume=41|issue=3|pages=321–327|doi=10.1007/bf02293557|s2cid=122907389}}</ref> as described by Courtney (2013)<ref name="pareonline.net">Courtney, M. G. R. (2013). Determining the number of factors to retain in EFA: Using the SPSS R-Menu v2.0 to make more judicious estimations. Practical Assessment, Research and Evaluation, 18(8). Available online:
http://pareonline.net/getvn.asp?v=18&n=8 {{Webarchive|url=https://web.archive.org/web/20150317145450/http://pareonline.net/getvn.asp?v=18&n=8 |date=2015-03-17 }}</ref> “involves a complete principal components analysis followed by the examination of a series of matrices of partial correlations” (p.&nbsp;397 (though this quote does not occur in Velicer (1976) and the cited page number is outside the pages of the citation). The squared correlation for Step “0” (see Figure 4) is the average squared off-diagonal correlation for the unpartialed correlation matrix. On Step 1, the first principal component and its associated items are partialed out. Thereafter, the average squared off-diagonal correlation for the subsequent correlation matrix is then computed for Step 1. On Step 2, the first two principal components are partialed out and the resultant average squared off-diagonal correlation is again computed. The computations are carried out for k minus one step (k representing the total number of variables in the matrix). Thereafter, all of the average squared correlations for each step are lined up and the step number in the analyses that resulted in the lowest average squared partial correlation determines the number of components or factors to retain.<ref name=Velicer/> By this method, components are maintained as long as the variance in the correlation matrix represents systematic variance, as opposed to residual or error variance. Although methodologically akin to principal components analysis, the MAP technique has been shown to perform quite well in determining the number of factors to retain in multiple simulation studies.<ref name="Zwick1986" /><ref name="Warne, R. T. 2014"/><ref name =Ruscio>{{cite journal|last=Ruscio|first=John|author2=Roche, B.|title=Determining the number of factors to retain in an exploratory factor analysis using comparison data of known factorial structure|journal=Psychological Assessment|year=2012|volume=24|issue=2|pages=282–292|doi=10.1037/a0025697|pmid=21966933}}</ref><ref name=Garrido>Garrido, L. E., & Abad, F. J., & Ponsoda, V. (2012). A new look at Horn's parallel analysis with ordinal variables. Psychological Methods. Advance online publication. {{doi|10.1037/a0030005}}</ref> This procedure is made available through SPSS's user interface,<ref name="pareonline.net"/> as well as the ''psych'' package for the [[R (programming language)|R programming language]].<ref>{{cite journal |last1=Revelle |first1=William |title=Determining the number of factors: the example of the NEO-PI-R |date=2007 |url=http://www.personality-project.org/r/book/numberoffactors.pdf}}</ref><ref>{{cite web |last1=Revelle |first1=William |title=psych: Procedures for Psychological, Psychometric, and PersonalityResearch |url=https://cran.r-project.org/web/packages/psych/ |date=8 January 2020}}</ref>

==== Older methods ====
Kaiser criterion: The Kaiser rule is to drop all components with eigenvalues under 1.0 – this being the eigenvalue equal to the information accounted for by an average single item.<ref name="Kaiser1960">{{cite journal |last1=Kaiser |first1=Henry F. |title=The Application of Electronic Computers to Factor Analysis |journal=Educational and Psychological Measurement |date=April 1960 |volume=20 |issue=1 |pages=141–151 |doi=10.1177/001316446002000116|s2cid=146138712 }}</ref> The Kaiser criterion is the default in [[SPSS]] and most [[statistical software]] but is not recommended when used as the sole cut-off criterion for estimating the number of factors as it tends to over-extract factors.<ref>{{cite book |first1=D.L. |last1=Bandalos |first2=M.R. |last2=Boehm-Kaufman |chapter=Four common misconceptions in exploratory factor analysis |editor1-first=Charles E. |editor1-last=Lance |editor2-first=Robert J. |editor2-last=Vandenberg |title=Statistical and Methodological Myths and Urban Legends: Doctrine, Verity and Fable in the Organizational and Social Sciences |chapter-url=https://books.google.com/books?id=KFAnkvqD8CgC&pg=PA61 |year=2008 |publisher=Taylor & Francis |isbn=978-0-8058-6237-9 |pages=61–87}}</ref> A variation of this method has been created where a researcher calculates [[confidence interval]]s for each eigenvalue and retains only factors which have the entire confidence interval greater than 1.0.<ref name="Warne, R. T. 2014">{{cite journal | last1 = Warne | first1 = R. T. | last2 = Larsen | first2 = R. | year = 2014 | title = Evaluating a proposed modification of the Guttman rule for determining the number of factors in an exploratory factor analysis | journal = Psychological Test and Assessment Modeling | volume = 56 | pages = 104–123 }}</ref><ref>{{cite journal | last1 = Larsen | first1 = R. | last2 = Warne | first2 = R. T. | year = 2010 | title = Estimating confidence intervals for eigenvalues in exploratory factor analysis | journal = Behavior Research Methods | volume = 42 | issue = 3| pages = 871–876 | doi = 10.3758/BRM.42.3.871 | pmid = 20805609 | doi-access = free }}</ref>

[[Scree plot]]:<ref>{{cite journal|first1=Raymond |last1=Cattell|journal=Multivariate Behavioral Research|volume=1|number=2|pages=245–76|year=1966|title=The scree test for the number of factors|doi=10.1207/s15327906mbr0102_10|pmid=26828106}}</ref>
The Cattell scree test plots the components as the X-axis and the corresponding [[eigenvalue]]s as the [[Y-axis]].  As one moves to the right, toward later components, the eigenvalues drop. When the drop ceases and the curve makes an elbow toward less steep decline, Cattell's scree test says to drop all further components after the one starting at the elbow. This rule is sometimes criticised for being amenable to researcher-controlled "[[Wiktionary:fudge factor|fudging]]".  That is, as picking the "elbow" can be subjective because the curve has multiple elbows or is a smooth curve, the researcher may be tempted to set the cut-off at the number of factors desired by their research agenda.{{Citation needed|date=March 2016}}

Variance explained criteria: Some researchers simply use the rule of keeping enough factors to account for 90% (sometimes 80%) of the variation.  Where the researcher's goal emphasizes [[Occam's razor|parsimony]] (explaining variance with as few factors as possible), the criterion could be as low as 50%.

==== Bayesian methods ====

By placing a [[Prior probability|prior distribution]] over the number of latent factors and then applying Bayes' theorem, Bayesian models can return a [[probability distribution]] over the number of latent factors. This has been modeled using the [[Indian buffet process]],<ref>{{cite book|author=Alpaydin|year=2020|title=Introduction to Machine Learning|edition=5th|pages=528–9}}</ref> but can be modeled more simply by placing any discrete prior (e.g. a [[negative binomial distribution]]) on the number of components.

===Rotation methods===
The output of PCA maximizes the variance accounted for by the first factor first, then the second factor, etc. A disadvantage of this procedure is that most items load on the early factors, while very few items load on later variables. This makes interpreting the factors by reading through a list of questions and loadings difficult, as every question is strongly correlated with the first few components, while very few questions are strongly correlated with the last few components.

Rotation serves to make the output easier to interpret. By [[Change of basis|choosing a different basis]] for the same principal components{{snd}}that is, choosing different factors to express the same correlation structure{{snd}}it is possible to create variables that are more easily interpretable.

Rotations can be orthogonal or oblique; oblique rotations allow the factors to correlate.<ref name="StackExchangeRotation">{{cite web |title=Factor rotation methods |url=https://stats.stackexchange.com/q/185216 |website=Stack Exchange |access-date=7 November 2022}}</ref> This increased flexibility means that more rotations are possible, some of which may be better at achieving a specified goal. However, this can also make the factors more difficult to interpret, as some information is "double-counted" and included multiple times in different components; some factors may even appear to be near-duplicates of each other. 

==== Orthogonal methods ====
Two broad classes of orthogonal rotations exist: those that look for sparse rows (where each row is a case, i.e. subject), and those that look for sparse columns (where each column is a variable).

* Simple factors: these rotations try to explain all factors by using only a few important variables. This effect can be achieved by using ''Varimax'' (the most common rotation).
* Simple variables: these rotations try to explain all variables using only a few important factors. This effect can be achieved using either ''Quartimax'' or the unrotated components of PCA.
* Both: these rotations try to compromise between both of the above goals, but in the process, may achieve a fit that is poor at both tasks; as such, they are unpopular compared to the above methods. ''Equamax'' is one such rotation.

====Problems with factor rotation====
It can be difficult to interpret a factor structure when each variable is loading on multiple factors. 
Small changes in the data can sometimes tip a balance in the factor rotation criterion so that a completely different factor rotation is produced. This can make it difficult to compare the results of different experiments. This problem is illustrated by a comparison of different studies of world-wide cultural differences. Each study has used different measures of cultural variables and produced a differently rotated factor analysis result. The authors of each study believed that they had discovered something new, and invented new names for the factors they found. A later comparison of the studies found that the results were rather similar when the unrotated results were compared. The common practice of factor rotation has obscured the similarity between the results of the different studies.<ref name="Fog2022">{{cite journal |last1=Fog |first1=A |title=Two-Dimensional Models of Cultural Differences: Statistical and Theoretical Analysis |journal=Cross-Cultural Research |date=2022 |volume=57 |issue=2–3 |pages=115–165 |doi=10.1177/10693971221135703|s2cid=253153619 |url=https://backend.orbit.dtu.dk/ws/files/292673942/Two_dimensional_models_of_culture.pdf }}</ref>

===Higher order factor analysis===

{{Confusing|date=March 2010}}

'''Higher-order factor analysis''' is a statistical method consisting of repeating steps factor analysis – [[oblique rotation]] – factor analysis of rotated factors.  Its merit is to enable the researcher to see the hierarchical structure of studied phenomena. To interpret the results, one proceeds either by [[matrix multiplication|post-multiplying]] the primary [[factor pattern matrix]] by the higher-order factor pattern matrices (Gorsuch, 1983) and perhaps applying a [[Varimax rotation]] to the result (Thompson, 1990) or by using a Schmid-Leiman solution (SLS, Schmid & Leiman, 1957, also known as Schmid-Leiman transformation) which attributes the [[Statistical dispersion|variation]] from the primary factors to the second-order factors.