Editing Principal component analysis (section)

{{Short description|Method of data analysis}}
[[File:GaussianScatterPCA.svg|thumb|upright=1.3|PCA of a [[multivariate Gaussian distribution]] centered at (1,3) with a standard deviation of 3 in roughly the (0.866,&nbsp;0.5) direction and of 1 in the orthogonal direction. The vectors shown are the [[Eigenvalues and eigenvectors|eigenvectors]] of the [[covariance matrix]] scaled by the square root of the corresponding eigenvalue, and shifted so their tails are at the mean.]]
{{Machine learning bar}}
'''Principal component analysis''' ('''PCA''') is a [[Linear map|linear]] [[dimensionality reduction]] technique with applications in [[exploratory data analysis]], visualization and [[Data Preprocessing|data preprocessing]].

The data is [[linear map|linearly transformed]] onto a new [[coordinate system]] such that the directions (principal components) capturing the largest variation in the data can be easily identified.

The '''principal components''' of a collection of points in a [[real coordinate space]] are a sequence of <math>p</math> [[unit vector]]s, where the <math>i</math>-th vector is the direction of a line that best fits the data while being [[orthogonal]] to the first <math>i-1</math> vectors. Here, a best-fitting line is defined as one that minimizes the average squared [[perpendicular distance|perpendicular]] [[Distance from a point to a line|distance from the points to the line]]. These directions (i.e., principal components) constitute an [[orthonormal basis]] in which different individual dimensions of the data are [[Linear correlation|linearly uncorrelated]]. Many studies use the first two principal components in order to plot the data in two dimensions and to visually identify clusters of closely related data points.<ref>{{cite journal |last1=Gewers |first1=Felipe L. |last2=Ferreira |first2=Gustavo R. |last3=Arruda |first3=Henrique F. De |last4=Silva |first4=Filipi N. |last5=Comin |first5=Cesar H. |last6=Amancio |first6=Diego R. |last7=Costa |first7=Luciano Da F. |title=Principal Component Analysis: A Natural Approach to Data Exploration |journal=ACM Comput. Surv. |date=24 May 2021 |volume=54 |issue=4 |pages=70:1–70:34 |doi=10.1145/3447755 |url=https://dl.acm.org/doi/abs/10.1145/3447755|arxiv=1804.02502 }}</ref>

Principal component analysis has applications in many fields such as [[population genetics]], [[microbiome]] studies, and [[atmospheric science]].<ref>{{Cite journal |last1=Jolliffe |first1=Ian T. |last2=Cadima |first2=Jorge |date=2016-04-13 |title=Principal component analysis: a review and recent developments |journal=Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences |volume=374 |issue=2065 |pages=20150202 |bibcode=2016RSPTA.37450202J |doi=10.1098/rsta.2015.0202 |pmc=4792409 |pmid=26953178}}</ref>