Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Dimensionality reduction
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Dimension reduction== For high-dimensional datasets, dimension reduction is usually performed prior to applying a [[k-nearest neighbors algorithm|''k''-nearest neighbors]] (''k''-NN) algorithm in order to mitigate the [[curse of dimensionality]].<ref>Kevin Beyer, Jonathan Goldstein, Raghu Ramakrishnan, Uri Shaft (1999) [http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.31.1422 "When is "nearest neighbor" meaningful?"]. ''Database Theory—ICDT99'', 217–235</ref> [[Feature extraction]] and dimension reduction can be combined in one step, using [[principal component analysis]] (PCA), [[linear discriminant analysis]] (LDA), [[canonical correlation analysis]] (CCA), or [[non-negative matrix factorization]] (NMF) techniques to pre-process the data, followed by clustering via ''k''-NN on [[Feature (machine learning)|feature vectors]] in a reduced-dimension space. In [[machine learning]], this process is also called low-dimensional [[embedding]].<ref>{{cite book |last1=Shaw |first1=B. |last2=Jebara |first2=T. |doi=10.1145/1553374.1553494 |chapter=Structure preserving embedding |title=Proceedings of the 26th Annual International Conference on Machine Learning – ICML '09 |pages=1 |year=2009 |isbn=9781605585161 |chapter-url=http://www.cs.columbia.edu/~jebara/papers/spe-icml09.pdf |citeseerx=10.1.1.161.451 |s2cid=8522279}}</ref> For high-dimensional datasets (e.g., when performing similarity search on live video streams, DNA data, or high-dimensional [[time series]]), running a fast '''approximate''' ''k''-NN search using [[locality-sensitive hashing]], [[random projection]],<ref>{{cite book |last1=Bingham |first1=E. |last2=Mannila |first2=H. |doi=10.1145/502512.502546 |chapter=Random projection in dimensionality reduction |title=Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining – KDD '01 |pages=245 |year=2001 |isbn=978-1581133912 |s2cid=1854295}}</ref> "sketches",<ref>Shasha, D High (2004) ''Performance Discovery in Time Series'' Berlin: Springer. {{ISBN|0-387-00857-8}}</ref> or other high-dimensional similarity search techniques from the [[International Conference on Very Large Data Bases|VLDB conference]] toolbox may be the only feasible option.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)