Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Principal component analysis
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
{{Short description|Method of data analysis}} [[File:GaussianScatterPCA.svg|thumb|upright=1.3|PCA of a [[multivariate Gaussian distribution]] centered at (1,3) with a standard deviation of 3 in roughly the (0.866, 0.5) direction and of 1 in the orthogonal direction. The vectors shown are the [[Eigenvalues and eigenvectors|eigenvectors]] of the [[covariance matrix]] scaled by the square root of the corresponding eigenvalue, and shifted so their tails are at the mean.]] {{Machine learning bar}} '''Principal component analysis''' ('''PCA''') is a [[Linear map|linear]] [[dimensionality reduction]] technique with applications in [[exploratory data analysis]], visualization and [[Data Preprocessing|data preprocessing]]. The data is [[linear map|linearly transformed]] onto a new [[coordinate system]] such that the directions (principal components) capturing the largest variation in the data can be easily identified. The '''principal components''' of a collection of points in a [[real coordinate space]] are a sequence of <math>p</math> [[unit vector]]s, where the <math>i</math>-th vector is the direction of a line that best fits the data while being [[orthogonal]] to the first <math>i-1</math> vectors. Here, a best-fitting line is defined as one that minimizes the average squared [[perpendicular distance|perpendicular]] [[Distance from a point to a line|distance from the points to the line]]. These directions (i.e., principal components) constitute an [[orthonormal basis]] in which different individual dimensions of the data are [[Linear correlation|linearly uncorrelated]]. Many studies use the first two principal components in order to plot the data in two dimensions and to visually identify clusters of closely related data points.<ref>{{cite journal |last1=Gewers |first1=Felipe L. |last2=Ferreira |first2=Gustavo R. |last3=Arruda |first3=Henrique F. De |last4=Silva |first4=Filipi N. |last5=Comin |first5=Cesar H. |last6=Amancio |first6=Diego R. |last7=Costa |first7=Luciano Da F. |title=Principal Component Analysis: A Natural Approach to Data Exploration |journal=ACM Comput. Surv. |date=24 May 2021 |volume=54 |issue=4 |pages=70:1β70:34 |doi=10.1145/3447755 |url=https://dl.acm.org/doi/abs/10.1145/3447755|arxiv=1804.02502 }}</ref> Principal component analysis has applications in many fields such as [[population genetics]], [[microbiome]] studies, and [[atmospheric science]].<ref>{{Cite journal |last1=Jolliffe |first1=Ian T. |last2=Cadima |first2=Jorge |date=2016-04-13 |title=Principal component analysis: a review and recent developments |journal=Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences |volume=374 |issue=2065 |pages=20150202 |bibcode=2016RSPTA.37450202J |doi=10.1098/rsta.2015.0202 |pmc=4792409 |pmid=26953178}}</ref>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)