Editing Principal component analysis (section)

== Relation with other methods ==

=== Correspondence analysis ===

[[Correspondence analysis]] (CA)
was developed by [[Jean-Paul Benzécri]]<ref>{{Cite book
 | author = Benzécri, J.-P.
 | publisher=Dunod |location= Paris, France
 | year = 1973
 | title = L'Analyse des Données. Volume II. L'Analyse des Correspondances
 }}</ref>
and is conceptually similar to PCA, but scales the data (which should be non-negative) so that rows and columns are treated equivalently. It is traditionally applied to [[contingency tables]].
CA decomposes the [[chi-squared statistic]] associated to this table into orthogonal factors.<ref>{{Cite book
 | author = Greenacre, Michael
 | publisher=Academic Press |location= London
 | year = 1983
 | title = Theory and Applications of Correspondence Analysis
 | isbn = 978-0-12-299050-2
 }}</ref>
Because CA is a descriptive technique, it can be applied to tables for which the chi-squared statistic is appropriate or not.
Several variants of CA are available including [[detrended correspondence analysis]] and [[canonical correspondence analysis]]. One special extension is [[multiple correspondence analysis]], which may be seen as the counterpart of principal component analysis for categorical data.<ref>{{Cite book
 |author1=Le Roux |author2=Brigitte and Henry Rouanet | publisher=Kluwer|location= Dordrecht
 | year = 2004
 | title = Geometric Data Analysis, From Correspondence Analysis to Structured Data Analysis
 | isbn =9781402022357
 | url=https://books.google.com/books?id=a6bDBUF58XwC
 }}</ref>

=== Factor analysis ===
[[File:PCA_versus_Factor_Analysis.jpg|thumb|The above picture is an example of the difference between PCA and Factor Analysis. In the top diagram the "factor" (e.g., career path) represents the three observed variables (e.g., doctor, lawyer, teacher) whereas in the bottom diagram the observed variables (e.g., pre-school teacher, middle school teacher, high school teacher) are reduced into the component of interest (e.g., teacher).]]
Principal component analysis creates variables that are linear combinations of the original variables. The new variables have the property that the variables are all orthogonal. The PCA transformation can be helpful as a pre-processing step before clustering. PCA is a variance-focused approach seeking to reproduce the total variable variance, in which components reflect both common and unique variance of the variable. PCA is generally preferred for purposes of data reduction (that is, translating variable space into optimal factor space) but not when the goal is to detect the latent construct or factors.

[[Factor analysis]] is similar to principal component analysis, in that factor analysis also involves linear combinations of variables. Different from PCA, factor analysis is a correlation-focused approach seeking to reproduce the inter-correlations among variables, in which the factors "represent the common variance of variables, excluding unique variance".<ref>Timothy A. Brown. [https://books.google.com/books?id=JDb3BQAAQBAJ Confirmatory Factor Analysis for Applied Research Methodology in the social sciences]. Guilford Press, 2006</ref> In terms of the correlation matrix, this corresponds with focusing on explaining the off-diagonal terms (that is, shared co-variance), while PCA focuses on explaining the terms that sit on the diagonal. However, as a side result, when trying to reproduce the on-diagonal terms, PCA also tends to fit relatively well the off-diagonal correlations.<ref name="Jolliffe2002" />{{rp|158}} Results given by PCA and factor analysis are very similar in most situations, but this is not always the case, and there are some problems where the results are significantly different. Factor analysis is generally used when the research purpose is detecting data structure (that is, latent constructs or factors) or [[causal modeling]]. If the factor model is incorrectly formulated or the assumptions are not met, then factor analysis will give erroneous results.<ref>{{cite journal |last1=Meglen|first1=R.R. |title=Examining Large Databases: A Chemometric Approach Using Principal Component Analysis|journal=Journal of Chemometrics |volume=5 |issue=3|pages=163–179 |date=1991 |doi=10.1002/cem.1180050305 |s2cid=120886184 }}</ref>

=== {{math|<var>K</var>}}-means clustering ===
It has been asserted that the relaxed solution of [[k-means clustering|{{math|<var>k</var>}}-means clustering]], specified by the cluster indicators, is given by the principal components, and the PCA subspace spanned by the principal directions is identical to the cluster centroid subspace.<ref>{{cite journal|author=H. Zha |author2=C. Ding |author3=M. Gu |author4=X. He |author5=H.D. Simon|title=Spectral Relaxation for K-means Clustering|journal=Neural Information Processing Systems Vol.14 (NIPS 2001)|pages=1057–1064|date=Dec 2001|url=http://ranger.uta.edu/~chqding/papers/Zha-Kmeans.pdf}}</ref><ref>{{cite journal|author=Chris Ding |author2=Xiaofeng He|title=K-means Clustering via Principal Component Analysis|journal=Proc. Of Int'l Conf. Machine Learning (ICML 2004)|pages=225–232|date=July 2004|url=http://ranger.uta.edu/~chqding/papers/KmeansPCA1.pdf}}</ref> However, that PCA is a useful relaxation of {{math|<var>k</var>}}-means clustering was not a new result,<ref>{{cite journal | title = Clustering large graphs via the singular value decomposition | journal = Machine Learning | year = 2004 | first = P. | last = Drineas |author2=A. Frieze |author3=R. Kannan |author4=S. Vempala |author5=V. Vinay | volume = 56 | issue = 1–3 | pages = 9–33| url = http://www.cc.gatech.edu/~vempala/papers/dfkvv.pdf | access-date = 2012-08-02 | doi=10.1023/b:mach.0000033113.59016.96| s2cid = 5892850 | doi-access = free }}</ref> and it is straightforward to uncover counterexamples to the statement that the cluster centroid subspace is spanned by the principal directions.<ref>{{cite book | title = Dimensionality reduction for k-means clustering and low rank approximation (Appendix B) | year = 2014 | first = M. | last = Cohen |author2=S. Elder |author3=C. Musco |author4=C. Musco |author5=M. Persu | arxiv = 1410.6801|bibcode=2014arXiv1410.6801C}}</ref>

=== Non-negative matrix factorization ===
[[File:Fractional Residual Variances comparison, PCA and NMF.pdf|thumb|500px|Fractional residual variance (FRV) plots for PCA and NMF;<ref name="ren18"/> for PCA, the theoretical values are the contribution from the residual eigenvalues. In comparison, the FRV curves for PCA reaches a flat plateau where no signal are captured effectively; while the NMF FRV curves decline continuously, indicating a better ability to capture signal. The FRV curves for NMF also converges to higher levels than PCA, indicating the less-overfitting property of NMF.]] [[Non-negative matrix factorization]] (NMF) is a dimension reduction method where only non-negative elements in the matrices are used, which is therefore a promising method in astronomy,<ref name="blantonRoweis07">{{Cite journal|arxiv=astro-ph/0606170|last1= Blanton|first1= Michael R.|title= K-corrections and filter transformations in the ultraviolet, optical, and near infrared |journal= The Astronomical Journal|volume= 133|issue= 2|pages= 734–754|last2= Roweis|first2= Sam |year= 2007|doi= 10.1086/510127|bibcode = 2007AJ....133..734B|s2cid= 18561804}}</ref><ref name="zhu16"/><ref name="ren18"/> in the sense that astrophysical signals are non-negative. The PCA components are orthogonal to each other, while the NMF components are all non-negative and therefore constructs a non-orthogonal basis.

In PCA, the contribution of each component is ranked based on the magnitude of its corresponding eigenvalue, which is equivalent to the fractional residual variance (FRV) in analyzing empirical data.<ref name = "soummer12">{{Cite journal|arxiv=1207.4197|last1= Soummer|first1= Rémi |title= Detection and Characterization of Exoplanets and Disks Using Projections on Karhunen-Loève Eigenimages|journal= The Astrophysical Journal Letters |volume= 755|issue= 2|pages= L28|last2= Pueyo|first2= Laurent|last3= Larkin | first3 = James|year= 2012|doi= 10.1088/2041-8205/755/2/L28|bibcode = 2012ApJ...755L..28S |s2cid= 51088743}}</ref> For NMF, its components are ranked based only on the empirical FRV curves.<ref name = "ren18">{{Cite journal|arxiv=1712.10317|last1= Ren|first1= Bin |title= Non-negative Matrix Factorization: Robust Extraction of Extended Structures|journal= The Astrophysical Journal|volume= 852|issue= 2|pages= 104|last2= Pueyo|first2= Laurent|last3= Zhu | first3 = Guangtun B.|last4= Duchêne|first4= Gaspard |year= 2018|doi= 10.3847/1538-4357/aaa1f2|bibcode = 2018ApJ...852..104R |s2cid= 3966513|doi-access= free}}</ref> The residual fractional eigenvalue plots, that is, <math> 1-\sum_{i=1}^k \lambda_i\Big/\sum_{j=1}^n \lambda_j</math> as a function of component number <math>k</math> given a total of <math>n</math> components, for PCA have a flat plateau, where no data is captured to remove the quasi-static noise, then the curves drop quickly as an indication of over-fitting (random noise).<ref name="soummer12"/> The FRV curves for NMF is decreasing continuously<ref name="ren18"/> when the NMF components are constructed [[Non-negative matrix factorization#Sequential NMF|sequentially]],<ref name="zhu16">{{Cite arXiv|last=Zhu|first=Guangtun B.|date=2016-12-19|title=Nonnegative Matrix Factorization (NMF) with Heteroscedastic Uncertainties and Missing data |eprint=1612.06037|class=astro-ph.IM}}</ref> indicating the continuous capturing of quasi-static noise; then converge to higher levels than PCA,<ref name="ren18"/> indicating the less over-fitting property of NMF.

=== Iconography of correlations ===
It is often difficult to interpret the principal components when the data include many variables of various origins, or when some variables are qualitative. This leads the PCA user to a delicate elimination of several variables. If observations or variables have an excessive impact on the direction of the axes, they should be removed and then projected as supplementary elements. In addition, it is necessary to avoid interpreting the proximities between the points close to the center of the factorial plane.

[[File:AirMerIconographyCorrelation.jpg|thumb|Iconography of correlations – Geochemistry of marine aerosols]]
The [[iconography of correlations]], on the contrary, which is not a projection on a system of axes, does not have these drawbacks. We can therefore keep all the variables.

The principle of the diagram is to underline the "remarkable" correlations of the correlation matrix, by a solid line (positive correlation) or dotted line (negative correlation).

A strong correlation is not "remarkable" if it is not direct, but caused by the effect of a third variable. Conversely, weak correlations can be "remarkable". For example, if a variable Y depends on several independent variables, the correlations of Y with each of them are weak and yet "remarkable".