Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Principal component analysis
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
=== PCA and information theory === Dimensionality reduction results in a loss of information, in general. PCA-based dimensionality reduction tends to minimize that information loss, under certain signal and noise models. Under the assumption that :<math>\mathbf{x}=\mathbf{s}+\mathbf{n},</math> that is, that the data vector <math>\mathbf{x}</math> is the sum of the desired information-bearing signal <math>\mathbf{s}</math> and a noise signal <math>\mathbf{n}</math> one can show that PCA can be optimal for dimensionality reduction, from an information-theoretic point-of-view. In particular, Linsker showed that if <math>\mathbf{s}</math> is Gaussian and <math>\mathbf{n}</math> is Gaussian noise with a covariance matrix proportional to the identity matrix, the PCA maximizes the [[mutual information]] <math>I(\mathbf{y};\mathbf{s})</math> between the desired information <math>\mathbf{s}</math> and the dimensionality-reduced output <math>\mathbf{y}=\mathbf{W}_L^T\mathbf{x}</math>.<ref>{{cite journal|last=Linsker|first=Ralph|title=Self-organization in a perceptual network|journal=IEEE Computer|date=March 1988|volume=21|issue=3|pages=105–117|doi=10.1109/2.36|s2cid=1527671}}</ref> If the noise is still Gaussian and has a covariance matrix proportional to the identity matrix (that is, the components of the vector <math>\mathbf{n}</math> are [[iid]]), but the information-bearing signal <math>\mathbf{s}</math> is non-Gaussian (which is a common scenario), PCA at least minimizes an upper bound on the ''information loss'', which is defined as<ref>{{cite book|last=Deco & Obradovic|title=An Information-Theoretic Approach to Neural Computing|year=1996|publisher=Springer|location=New York, NY|url=https://books.google.com/books?id=z4XTBwAAQBAJ|isbn=9781461240167}}</ref><ref>{{cite book |last=Plumbley|first=Mark|title=Information theory and unsupervised neural networks|year=1991}}Tech Note</ref> :<math>I(\mathbf{x};\mathbf{s}) - I(\mathbf{y};\mathbf{s}).</math> The optimality of PCA is also preserved if the noise <math>\mathbf{n}</math> is iid and at least more Gaussian (in terms of the [[Kullback–Leibler divergence]]) than the information-bearing signal <math>\mathbf{s}</math>.<ref>{{cite journal|last=Geiger|first=Bernhard|author2=Kubin, Gernot|title=Signal Enhancement as Minimization of Relevant Information Loss|journal=Proc. ITG Conf. On Systems, Communication and Coding|date=January 2013|arxiv=1205.6935|bibcode=2012arXiv1205.6935G}}</ref> In general, even if the above signal model holds, PCA loses its information-theoretic optimality as soon as the noise <math>\mathbf{n}</math> becomes dependent.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)