Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Principal component analysis
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Applications == === Intelligence === The earliest application of factor analysis was in locating and measuring components of human intelligence. It was believed that intelligence had various uncorrelated components such as spatial intelligence, verbal intelligence, induction, deduction etc and that scores on these could be adduced by factor analysis from results on various tests, to give a single index known as the [[Intelligence quotient|Intelligence Quotient]] (IQ). The pioneering statistical psychologist [[Charles Spearman|Spearman]] actually developed factor analysis in 1904 for his [[Two-factor theory of intelligence|two-factor theory]] of intelligence, adding a formal technique to the science of [[psychometrics]]. In 1924 [[Louis Leon Thurstone|Thurstone]] looked for 56 factors of intelligence, developing the notion of Mental Age. Standard IQ tests today are based on this early work.<ref name="Kaplan, R.M. 2010">Kaplan, R.M., & Saccuzzo, D.P. (2010). ''Psychological Testing: Principles, Applications, and Issues.'' (8th ed.). Belmont, CA: Wadsworth, Cengage Learning.</ref> === Residential differentiation === In 1949, Shevky and Williams introduced the theory of '''factorial ecology''', which dominated studies of residential differentiation from the 1950s to the 1970s.<ref>{{Cite book |last1=Shevky |first1=Eshref |title=The Social Areas of Los Angeles: Analysis and Typology |last2=Williams |first2=Marilyn |publisher=University of California Press |year=1949}}</ref> Neighbourhoods in a city were recognizable or could be distinguished from one another by various characteristics which could be reduced to three by factor analysis. These were known as 'social rank' (an index of occupational status), 'familism' or family size, and 'ethnicity'; Cluster analysis could then be applied to divide the city into clusters or precincts according to values of the three key factor variables. An extensive literature developed around factorial ecology in urban geography, but the approach went out of fashion after 1980 as being methodologically primitive and having little place in postmodern geographical paradigms. One of the problems with factor analysis has always been finding convincing names for the various artificial factors. In 2000, Flood revived the factorial ecology approach to show that principal components analysis actually gave meaningful answers directly, without resorting to factor rotation. The principal components were actually dual variables or shadow prices of 'forces' pushing people together or apart in cities. The first component was 'accessibility', the classic trade-off between demand for travel and demand for space, around which classical urban economics is based. The next two components were 'disadvantage', which keeps people of similar status in separate neighbourhoods (mediated by planning), and ethnicity, where people of similar ethnic backgrounds try to co-locate.<ref>Flood, J (2000). Sydney divided: factorial ecology revisited. Paper to the APA Conference 2000, Melbourne, November and to the 24th ANZRSAI Conference, Hobart, December 2000.[https://www.academia.edu/5135339/Sydney_Divided_Factorial_Ecology_Revisited]</ref> About the same time, the Australian Bureau of Statistics defined distinct indexes of advantage and disadvantage taking the first principal component of sets of key variables that were thought to be important. These SEIFA indexes are regularly published for various jurisdictions, and are used frequently in spatial analysis.<ref>{{Cite web |last= |first= |date=2011 |title=Socio-Economic Indexes for Areas |url=https://www.abs.gov.au/websitedbs/censushome.nsf/home/seifa |access-date=2022-05-05 |website=Australian Bureau of Statistics |language=en}}</ref> === Development indexes === PCA can be used as a formal method for the development of indexes. As an alternative [[confirmatory composite analysis]] has been proposed to develop and assess indexes.<ref>{{cite journal |last1=Schamberger |first1=Tamara |last2=Schuberth |first2=Florian |last3=Henseler |first3=Jörg |title=Confirmatory composite analysis in human development research |journal=International Journal of Behavioral Development |date=2023 |volume=47 |issue=1 |pages=88–100 |doi=10.1177/01650254221117506|hdl=10362/143639 |hdl-access=free }}</ref> The City Development Index was developed by PCA from about 200 indicators of city outcomes in a 1996 survey of 254 global cities. The first principal component was subject to iterative regression, adding the original variables singly until about 90% of its variation was accounted for. The index ultimately used about 15 indicators but was a good predictor of many more variables. Its comparative value agreed very well with a subjective assessment of the condition of each city. The coefficients on items of infrastructure were roughly proportional to the average costs of providing the underlying services, suggesting the Index was actually a measure of effective physical and social investment in the city. The country-level [[Human Development Index]] (HDI) from [[United Nations Development Programme|UNDP]], which has been published since 1990 and is very extensively used in development studies,<ref>{{Cite web |last=Human Development Reports |title=Human Development Index |url=https://hdr.undp.org/en/content/human-development-index-hdi |access-date=2022-05-06 |website=United Nations Development Programme}}</ref> has very similar coefficients on similar indicators, strongly suggesting it was originally constructed using PCA. === Population genetics === In 1978 [[Luigi Luca Cavalli-Sforza|Cavalli-Sforza]] and others pioneered the use of principal components analysis (PCA) to summarise data on variation in human gene frequencies across regions. The components showed distinctive patterns, including gradients and sinusoidal waves. They interpreted these patterns as resulting from specific ancient migration events. Since then, PCA has been ubiquitous in population genetics, with thousands of papers using PCA as a display mechanism. Genetics varies largely according to proximity, so the first two principal components actually show spatial distribution and may be used to map the relative geographical location of different population groups, thereby showing individuals who have wandered from their original locations.<ref>{{Cite journal |last1=Novembre |first1=John |last2=Stephens |first2=Matthew |date=2008 |title=Interpreting principal component analyses of spatial population genetic variation |journal=Nat Genet |volume=40 |issue=5 |pages=646–49 |doi=10.1038/ng.139 |pmid=18425127 |pmc=3989108 }}</ref> PCA in genetics has been technically controversial, in that the technique has been performed on discrete non-normal variables and often on binary allele markers. The lack of any measures of standard error in PCA are also an impediment to more consistent usage. In August 2022, the molecular biologist [[Eran Elhaik]] published a theoretical paper in [[Scientific Reports]] analyzing 12 PCA applications. He concluded that it was easy to manipulate the method, which, in his view, generated results that were 'erroneous, contradictory, and absurd.' Specifically, he argued, the results achieved in population genetics were characterized by cherry-picking and [[circular reasoning]].<ref>{{cite journal | first = Eran | last = Elhaik | author-link = Eran Elhaik | doi = 10.1038/s41598-022-14395-4 | title = Principal Component Analyses (PCA)‑based findings in population genetic studies are highly biased and must be reevaluated | journal = [[Scientific Reports]] | volume = 12 | at = 14683 | year = 2022| issue = 1 | pmid = 36038559 | pmc = 9424212 | bibcode = 2022NatSR..1214683E | s2cid = 251932226 | doi-access = free }}</ref> === Market research and indexes of attitude === Market research has been an extensive user of PCA. It is used to develop customer satisfaction or customer loyalty scores for products, and with clustering, to develop market segments that may be targeted with advertising campaigns, in much the same way as factorial ecology will locate geographical areas with similar characteristics.<ref>{{Cite journal |last1=DeSarbo |first1=Wayne |last2=Hausmann |first2=Robert |last3=Kukitz |first3=Jeffrey |date=2007 |title=Restricted principal components analysis for marketing research |url=https://www.researchgate.net/publication/247623679 |journal=Journal of Marketing in Management |volume=2 |pages=305–328 |via=ResearchGate}}</ref> PCA rapidly transforms large amounts of data into smaller, easier-to-digest variables that can be more rapidly and readily analyzed. In any consumer questionnaire, there are series of questions designed to elicit consumer attitudes, and principal components seek out latent variables underlying these attitudes. For example, the Oxford Internet Survey in 2013 asked 2000 people about their attitudes and beliefs, and from these analysts extracted four principal component dimensions, which they identified as 'escape', 'social networking', 'efficiency', and 'problem creating'.<ref>{{Cite book |last1=Dutton |first1=William H |url=http://oxis.oii.ox.ac.uk/wp-content/uploads/2014/11/OxIS-2013.pdf |title=Cultures of the Internet: The Internet in Britain |last2=Blank |first2=Grant |publisher=Oxford Internet Institute |year=2013 |pages=6}}</ref> Another example from Joe Flood in 2008 extracted an attitudinal index toward housing from 28 attitude questions in a national survey of 2697 households in Australia. The first principal component represented a general attitude toward property and home ownership. The index, or the attitude questions it embodied, could be fed into a General Linear Model of tenure choice. The strongest determinant of private renting by far was the attitude index, rather than income, marital status or household type.<ref>{{Cite journal |last=Flood |first=Joe |date=2008 |title=Multinomial Analysis for Housing Careers Survey |url=https://www.academia.edu/33218811 |access-date=6 May 2022 |website=Paper to the European Network for Housing Research Conference, Dublin}}</ref> === Quantitative finance === In [[quantitative finance]], PCA is used<ref name="Miller">See Ch. 9 in Michael B. Miller (2013). ''Mathematics and Statistics for Financial Risk Management'', 2nd Edition. Wiley {{ISBN|978-1-118-75029-2}}</ref> in [[financial risk management]], and has been applied to [[Financial modeling#Quantitative finance|other problems]] such as [[portfolio optimization]]. PCA is commonly used in problems involving [[fixed income]] securities and [[Bond fund|portfolios]], and [[interest rate derivative]]s. Valuations here depend on the entire [[yield curve]], comprising numerous highly correlated instruments, and PCA is used to define a set of components or factors that explain rate movements,<ref name="Hull"/> thereby facilitating the modelling. One common risk management application is to [[Value at risk#Computation methods|calculating value at risk]], VaR, applying PCA to the [[Monte Carlo methods in finance|Monte Carlo simulation]]. <ref>§III.A.3.7.2 in Carol Alexander and Elizabeth Sheedy, eds. (2004). ''The Professional Risk Managers’ Handbook''. [[PRMIA]]. {{isbn|978-0976609704}}</ref> Here, for each simulation-sample, the components are stressed, and rates, and [[Monte Carlo methods for option pricing#Methodology|in turn option values]], are then reconstructed; with VaR calculated, finally, over the entire run. PCA is also used in [[hedge (finance)|hedging]] exposure to [[interest rate risk]], given [[Key rate duration|partial duration]]s and other sensitivities. <ref name="Hull">§9.7 in [[John C. Hull (economist)|John Hull]] (2018). ''Risk Management and Financial Institutions,'' 5th Edition. Wiley. {{isbn|1119448115}}</ref> Under both, the first three, typically, principal components of the system are of interest ([[Fixed-income attribution#Modeling the yield curve|representing]] "shift", "twist", and "curvature"). These principal components are derived from an eigen-decomposition of the [[covariance matrix]] of [[yield curve|yield]] at predefined maturities; <ref>[https://www-2.rotman.utoronto.ca/~hull/RMFI/PCA_6thEdition_Example.xls example decomposition], [[John C. Hull (economist)|John Hull]]</ref> and where the [[variance]] of each component is its [[eigenvalue]] (and as the components are [[orthogonal]], no correlation need be incorporated in subsequent modelling). For [[equity (finance)|equity]], an optimal portfolio is one where the [[expected return]] is maximized for a given level of risk, or alternatively, where risk is minimized for a given return; see [[Markowitz model]] for discussion. Thus, one approach is to reduce portfolio risk, where [[asset allocation|allocation strategies]] are applied to the "principal portfolios" instead of the underlying [[Capital stock|stock]]s. A second approach is to enhance portfolio return, using the principal components to select companies' stocks with upside potential. <ref>Libin Yang. [https://ir.canterbury.ac.nz/bitstream/handle/10092/10293/thesis.pdf?sequence=1 ''An Application of Principal Component Analysis to Stock Portfolio Management'']. Department of Economics and Finance, [[University of Canterbury]], January 2015.</ref> <ref>Giorgia Pasini (2017); [https://ijpam.eu/contents/2017-115-1/12/12.pdf Principal Component Analysis for Stock Portfolio Management]. ''International Journal of Pure and Applied Mathematics''. Volume 115 No. 1 2017, 153–167</ref> PCA has also been used to understand relationships <ref name="Miller"/> between international [[equity market]]s, and within markets between groups of companies in industries or [[Stock market index#Types of indices by coverage|sectors]]. PCA may also be applied to [[Stress test (financial)|stress testing]],<ref name="IMF">See Ch. 25 § "Scenario testing using principal component analysis" in Li Ong (2014). [https://www.elibrary.imf.org/display/book/9781484368589/9781484368589.xml "A Guide to IMF Stress Testing Methods and Models"], [[International Monetary Fund]]</ref> essentially an analysis of a bank's ability to endure [[List of bank stress tests|a hypothetical adverse economic scenario]]. Its utility is in "distilling the information contained in [several] [[Macroeconomic model|macroeconomic variables]] into a more manageable data set, which can then [be used] for analysis."<ref name="IMF"/> Here, the resulting factors are linked to e.g. interest rates – based on the largest elements of the factor's [[eigenvector]] – and it is then observed how a "shock" to each of the factors affects the implied assets of each of the banks. === Neuroscience === A variant of principal components analysis is used in [[neuroscience]] to identify the specific properties of a stimulus that increases a [[neuron]]'s probability of generating an [[action potential]].<ref>{{cite journal|last1=Chapin|first1=John|last2=Nicolelis |first2=Miguel|title=Principal component analysis of neuronal ensemble activity reveals multidimensional somatosensory representations|journal=Journal of Neuroscience Methods|date=1999|volume=94|issue=1|pages=121–140|doi=10.1016/S0165-0270(99)00130-2|pmid=10638820|s2cid=17786731 }}</ref><ref name="brenner00">Brenner, N., Bialek, W., & de Ruyter van Steveninck, R.R. (2000).</ref> This technique is known as [[Spike-triggered covariance|spike-triggered covariance analysis]]. In a typical application an experimenter presents a [[white noise]] process as a stimulus (usually either as a sensory input to a test subject, or as a [[Electric current|current]] injected directly into the neuron) and records a train of action potentials, or spikes, produced by the neuron as a result. Presumably, certain features of the stimulus make the neuron more likely to spike. In order to extract these features, the experimenter calculates the [[covariance matrix]] of the ''spike-triggered ensemble'', the set of all stimuli (defined and discretized over a finite time window, typically on the order of 100 ms) that immediately preceded a spike. The [[Eigenvectors and eigenvalues|eigenvectors]] of the difference between the spike-triggered covariance matrix and the covariance matrix of the ''prior stimulus ensemble'' (the set of all stimuli, defined over the same length time window) then indicate the directions in the [[Vector space|space]] of stimuli along which the variance of the spike-triggered ensemble differed the most from that of the prior stimulus ensemble. Specifically, the eigenvectors with the largest positive eigenvalues correspond to the directions along which the variance of the spike-triggered ensemble showed the largest positive change compared to the variance of the prior. Since these were the directions in which varying the stimulus led to a spike, they are often good approximations of the sought after relevant stimulus features. In neuroscience, PCA is also used to discern the identity of a neuron from the shape of its action potential. [[Spike sorting]] is an important procedure because [[Electrophysiology#Extracellular recording|extracellular]] recording techniques often pick up signals from more than one neuron. In spike sorting, one first uses PCA to reduce the dimensionality of the space of action potential waveforms, and then performs [[Cluster analysis|clustering analysis]] to associate specific action potentials with individual neurons. PCA as a dimension reduction technique is particularly suited to detect coordinated activities of large neuronal ensembles. It has been used in determining collective variables, that is, [[order parameters]], during [[phase transitions]] in the brain.<ref>{{cite journal|last1=Jirsa|first1=Victor|last2=Friedrich|first2=R|last3=Haken|first3=Herman|last4=Kelso|first4=Scott|title=A theoretical model of phase transitions in the human brain|journal=Biological Cybernetics|date=1994|volume=71|issue=1|pages=27–35|doi=10.1007/bf00198909|pmid=8054384|s2cid=5155075}}</ref>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)