Editing Nonlinear dimensionality reduction (section)

== Other algorithms ==
=== Relational perspective map ===
Relational perspective map is a [[multidimensional scaling]] algorithm. The algorithm finds a configuration of data points on a manifold by simulating a multi-particle dynamic system on a closed manifold, where data points are mapped to particles and distances (or dissimilarity) between data points represent a repulsive force. As the manifold gradually grows in size the multi-particle system cools down gradually and converges to a configuration that reflects the distance information of the data points.

Relational perspective map was inspired by a physical model in which positively charged particles move freely on the surface of a ball.  Guided by the [[Charles-Augustin de Coulomb|Coulomb]] [[Coulomb's law|force]] between particles, the minimal energy configuration of the particles will reflect the strength of repulsive forces between the particles.

The Relational perspective map was introduced in.<ref>{{cite journal |first=James X. |last=Li |url=http://www.palgrave-journals.com/ivs/journal/v3/n1/pdf/9500051a.pdf |title=Visualizing high-dimensional data with relational perspective map |journal=Information Visualization |date=2004 |volume=3 |pages=49–59|doi=10.1057/palgrave.ivs.9500051 |s2cid=7566939 }}</ref>
The algorithm firstly used the flat [[torus]] as the image manifold, then it has been extended (in the software [http://www.VisuMap.com VisuMap] to use other types of closed manifolds, like the [[sphere]], [[projective space]], and [[Klein bottle]], as image manifolds.

=== Contagion maps ===

Contagion maps use multiple contagions on a network to map the nodes as a point cloud.<ref>{{cite journal |last1=Taylor |first1=D. |last2=Klimm |first2=F. |last3=Harrington |first3=H. A. |last4=Kramár |first4=M. |last5=Mischaikow |first5=K. |last6=Porter |first6=M. A. |last7=Mucha |first7=P. J. |year=2015 |title=Topological data analysis of contagion maps for examining spreading processes on networks |journal=Nature Communications |volume=6 |pages=7723 |doi=10.1038/ncomms8723 |pmid=26194875 |pmc=4566922 |arxiv=1408.1168 |bibcode=2015NatCo...6.7723T }}</ref> In the case of the [[Global cascades model]] the speed of the spread can be adjusted with the threshold parameter <math> t \in [0,1] </math>. For <math> t=0 </math> the contagion map is equivalent to the [[Isomap]] algorithm.

=== Curvilinear component analysis ===

[[Curvilinear component analysis]] (CCA) looks for the configuration of points in the output space that preserves original distances as much as possible while focusing on small distances in the output space (conversely to [[Sammon's mapping]] which focus on small distances in original space).<ref name="Demart">{{cite journal |first1=P. |last1=Demartines |first2=J. |last2=Hérault |url=http://chronos.isir.upmc.fr/~gas/pam/img_auth.php/2/23/Demartines1997.pdf |title=Curvilinear Component Analysis: A Self-Organizing Neural Network for Nonlinear Mapping of Data Sets |journal=IEEE Transactions on Neural Networks |volume=8 |issue=1 |year=1997 |pages=148–154 |doi=10.1109/72.554199 |pmid=18255618 }}</ref>

It should be noticed that CCA, as an iterative learning algorithm, actually starts with focus on large distances (like the Sammon algorithm), then gradually change focus to small distances. The small distance information will overwrite the large distance information, if compromises between the two have to be made.

The stress function of CCA is related to a sum of right Bregman divergences.<ref name="Jigang">{{cite book |first1=Jigang |last1=Sun |first2=Malcolm |last2=Crowe |first3=Colin |last3=Fyfe |chapter-url=http://www.dice.ucl.ac.be/Proceedings/esann/esannpdf/es2010-107.pdf |chapter=Curvilinear component analysis and Bregman divergences |title=European Symposium on Artificial Neural Networks (Esann) |pages=81–86 |publisher=d-side publications |year=2010 }}</ref>

=== Curvilinear distance analysis ===

CDA<ref name="Demart" /> trains a self-organizing neural network to fit the manifold and seeks to preserve [[geodesic distance]]s in its embedding. It is based on Curvilinear Component Analysis (which extended Sammon's mapping), but uses geodesic distances instead.

=== Diffeomorphic dimensionality reduction ===

[[Diffeomorphic]] Dimensionality Reduction or ''Diffeomap''<ref>{{cite book |first1=Christian |last1=Walder |first2=Bernhard |last2=Schölkopf |chapter-url=https://papers.nips.cc/paper/3542-diffeomorphic-dimensionality-reduction.pdf |chapter=Diffeomorphic Dimensionality Reduction |title=Advances in Neural Information Processing Systems |volume=22 |date=2009 |pages=1713–20 |publisher=MIT Press}}</ref> learns a smooth diffeomorphic mapping which transports the data onto a lower-dimensional linear subspace. The methods solves for a smooth time indexed vector field such that flows along the field which start at the data points will end at a lower-dimensional linear subspace, thereby attempting to preserve pairwise differences under both the forward and inverse mapping.

=== Manifold alignment ===
[[Manifold alignment]] takes advantage of the assumption that disparate data sets produced by similar generating processes will share a similar underlying manifold representation. By learning projections from each original space to the shared manifold, correspondences are recovered and knowledge from one domain can be transferred to another. Most manifold alignment techniques consider only two data sets, but the concept extends to arbitrarily many initial data sets.<ref>{{cite conference|last=Wang|first=Chang|author2=Mahadevan, Sridhar |title=Manifold Alignment using Procrustes Analysis|conference=The 25th International Conference on Machine Learning|date=July 2008|pages=1120–7|url=http://people.cs.umass.edu/~chwang/papers/ICML-2008.pdf}}</ref>

=== Diffusion maps ===
[[Diffusion map]]s leverages the relationship between heat [[diffusion]] and a [[random walk]] ([[Markov Chain]]); an analogy is drawn between the diffusion operator on a manifold and a Markov transition matrix operating on functions defined on the graph whose nodes were sampled from the manifold.<ref>{{cite thesis |title=Diffusion Maps and Geometric Harmonics |first=Stephane |last=Lafon |type=PhD |publisher=[[Yale University]] |date=May 2004 |url=http://search.library.yale.edu/catalog/6714668 }}</ref> In particular, let a data set be represented by <math> \mathbf{X} = [x_1,x_2,\ldots,x_n] \in \Omega \subset \mathbf {R^D}</math>. The underlying assumption of diffusion map is that the high-dimensional data lies on a low-dimensional manifold of dimension <math> \mathbf{d} </math>. Let '''X''' represent the data set and <math> \mu </math> represent the distribution of the data points on '''X'''. Further, define a '''kernel''' which represents some notion of affinity of the points in '''X'''. The kernel <math> \mathit{k} </math> has the following properties<ref name="ReferenceA">{{cite journal |title=Diffusion Maps |first1=Ronald R. |last1=Coifman |first2=Stephane |last2=Lafon |journal=Applied and Computational Harmonic Analysis |volume=21 |issue=1 |date=July 2006 |pages=5–30 |doi=10.1016/j.acha.2006.04.006 |s2cid=17160669 |url=https://www.math.ucdavis.edu/~strohmer/courses/270/diffusion_maps.pdf  }}</ref>

: <math>k(x,y) = k(y,x), </math>

''k'' is symmetric

: <math> k(x,y) \geq 0\qquad \forall x,y, k </math>

''k'' is positivity preserving

Thus one can think of the individual data points as the nodes of a graph and the kernel ''k'' as defining some sort of affinity on that graph. The graph is symmetric by construction since the kernel is symmetric. It is easy to see here that from the tuple ('''X''','''k''') one can construct a reversible [[Markov Chain]]. This technique is common to a variety of fields and is known as the graph Laplacian.

For example, the graph '''K''' = (''X'',''E'') can be constructed using a Gaussian kernel.

: <math> K_{ij} = \begin{cases}
e^{-\|x_i -x_j\|^2_2/\sigma ^2} & \text{if } x_i \sim x_j \\
0                          & \text{otherwise}
\end{cases}
</math>

In the above equation, <math> x_i \sim x_j </math> denotes that <math> x_i </math> is a nearest neighbor of <math>x_j </math>. Properly, [[Geodesic]] distance should be used to actually measure distances on the [[manifold]]. Since the exact structure of the manifold is not available, for the nearest neighbors the geodesic distance is approximated by euclidean distance. The choice <math> \sigma </math> modulates our notion of proximity in the sense that if <math> \|x_i - x_j\|_2 \gg \sigma </math> then <math> K_{ij} = 0 </math> and if <math> \|x_i - x_j\|_2 \ll \sigma </math> then <math> K_{ij} = 1 </math>. The former means that very little diffusion has taken place while the latter implies that the diffusion process is nearly complete. Different strategies to choose <math> \sigma </math> can be found in.<ref>{{cite thesis |first=B. |last=Bah |date=2008 |title=Diffusion Maps: Applications and Analysis |type=Masters |publisher=University of Oxford |url=http://solo.bodleian.ox.ac.uk/permalink/f/89vilt/oxfaleph017015682 }}</ref>

In order to faithfully represent a Markov matrix, <math> K </math> must be normalized by the corresponding [[degree matrix]] <math> D </math>:

: <math> P = D^{-1}K. </math>

<math> P </math> now represents a Markov chain. <math> P(x_i,x_j) </math> is the probability of transitioning from <math> x_i </math> to <math> x_j </math> in one time step. Similarly the probability of transitioning from <math> x_i </math> to <math> x_j </math> in '''t''' time steps is given by <math> P^t (x_i,x_j) </math>. Here <math> P^t </math> is the matrix <math> P </math> multiplied by itself '''t''' times.

The Markov matrix <math> P </math> constitutes some notion of local geometry of the data set '''X'''. The major difference between diffusion maps and [[principal component analysis]] is that only local features of the data are considered in diffusion maps as opposed to taking correlations of the entire data set.

<math> K </math> defines a random walk on the data set which means that the kernel captures some local geometry of data set. The Markov chain defines fast and slow directions of propagation through the kernel values. As the walk propagates forward in time, the local geometry information aggregates in the same way as local transitions (defined by differential equations) of the dynamical system.<ref name="ReferenceA"/> The metaphor of diffusion arises from the definition of a family diffusion distance <math>\{ D_t \}_{ t \in N} </math>

: <math> D_t^2(x,y) = \|p_t(x,\cdot) - p_t(y,\cdot)\|^2 </math>

For fixed t, <math> D_t </math> defines a distance between any two points of the data set based on path connectivity: the value of <math> D_t(x,y) </math> will be smaller the more paths that connect '''x''' to '''y''' and vice versa. Because the quantity <math> D_t(x,y) </math> involves a sum over of all paths of length t, <math> D_t </math> is much more robust to noise in the data than geodesic distance. <math> D_t </math> takes into account all the relation between points x and y while calculating the distance and serves as a better notion of proximity than just [[Euclidean distance]] or even geodesic distance.

=== Local multidimensional scaling ===

Local Multidimensional Scaling performs [[multidimensional scaling]] in local regions, and then uses convex optimization to fit all the pieces together.<ref>{{cite journal |first1=J. |last1=Venna |first2=S. |last2=Kaski |title=Local multidimensional scaling |journal=Neural Networks |volume=19 |issue=6–7 |year=2006 |pages=889–899 |doi=10.1016/j.neunet.2006.05.014 |pmid=16787737 }}</ref>

=== Nonlinear PCA ===

Nonlinear PCA (NLPCA) uses [[backpropagation]] to train a multi-layer perceptron (MLP) to fit to a manifold.<ref>{{cite journal |last1=Scholz |first1=M. |last2=Kaplan |first2=F. |last3=Guy |first3=C. L. |last4=Kopka |first4=J. |last5=Selbig |first5=J. |title=Non-linear PCA: a missing data approach |journal=Bioinformatics |volume=21 |issue=20 |pages=3887–95 |publisher=Oxford University Press |year=2005 |doi=10.1093/bioinformatics/bti634 |pmid=16109748 |doi-access=free |hdl=11858/00-001M-0000-0014-2B1F-2 |hdl-access=free }}</ref> Unlike typical MLP training, which only updates the weights, NLPCA updates both the weights and the inputs. That is, both the weights and inputs are treated as latent values. After training, the latent inputs are a low-dimensional representation of the observed vectors, and the MLP maps from that low-dimensional representation to the high-dimensional observation space.

=== Data-driven high-dimensional scaling ===

Data-driven high-dimensional scaling (DD-HDS)<ref>S. Lespinats, M. Verleysen, A. Giron, B. Fertil, DD-HDS: a tool for visualization and exploration of high-dimensional data, IEEE Transactions on Neural Networks 18 (5) (2007) 1265–1279.</ref> is closely related to [[Sammon's mapping]] and curvilinear component analysis except that (1) it simultaneously penalizes false neighborhoods and tears by focusing on small distances in both original and output space, and that (2) it accounts for [[concentration of measure]] phenomenon by adapting the weighting function to the distance distribution.

=== Manifold sculpting ===

Manifold Sculpting<ref>Gashler, M. and Ventura, D. and Martinez, T., ''[http://axon.cs.byu.edu/papers/gashler2007nips.pdf Iterative Non-linear Dimensionality Reduction with Manifold Sculpting]'', In Platt, J.C. and Koller, D. and Singer, Y. and Roweis, S., editor, Advances in Neural Information Processing Systems 20, pp. 513–520, MIT Press, Cambridge, MA, 2008</ref> uses [[graduated optimization]] to find an embedding. Like other algorithms, it computes the ''k''-nearest neighbors and tries to seek an embedding that preserves relationships in local neighborhoods. It slowly scales variance out of higher dimensions, while simultaneously adjusting points in lower dimensions to preserve those relationships. If the rate of scaling is small, it can find very precise embeddings. It boasts higher empirical accuracy than other algorithms with several problems. It can also be used to refine the results from other manifold learning algorithms. It struggles to unfold some manifolds, however, unless a very slow scaling rate is used. It has no model.

=== RankVisu ===

RankVisu<ref>Lespinats S., Fertil B., Villemain P. and Herault J., [https://www.sciencedirect.com/science/article/pii/S0925231209001544 Rankvisu: Mapping from the neighbourhood network], Neurocomputing, vol. 72 (13–15), pp. 2964–2978, 2009.</ref> is designed to preserve rank of neighborhood rather than distance. RankVisu is especially useful on difficult tasks (when the preservation of distance cannot be achieved satisfyingly). Indeed, the rank of neighborhood is less informative than distance (ranks can be deduced from distances but distances cannot be deduced from ranks) and its preservation is thus easier.

=== Topologically constrained isometric embedding ===

[[Topologically constrained isometric embedding]] (TCIE)<ref>{{cite journal |last1=Rosman |first1=G. |last2=Bronstein |first2=M.M. |last3=Bronstein |first3=A.M. |last4=Kimmel |first4=R.  |url=https://www.cs.technion.ac.il/users/wwwb/cgi-bin/tr-get.cgi/2009/CIS/CIS-2009-05.pdf |title=Nonlinear Dimensionality Reduction by Topologically Constrained Isometric Embedding |journal=International Journal of Computer Vision |volume=89 |issue=1 |pages=56–68 |date=2010 |doi=10.1007/s11263-010-0322-1 |s2cid=1365750 }}</ref> is an algorithm based on approximating geodesic distances after filtering geodesics inconsistent with the Euclidean metric. Aimed at correcting the distortions caused when Isomap is used to map intrinsically non-convex data, TCIE uses weight least-squares MDS in order to obtain a more accurate mapping. The TCIE algorithm first detects possible boundary points in the data, and during computation of the geodesic length marks inconsistent geodesics, to be given a small weight in the weighted [[stress majorization]] that follows.

=== Uniform manifold approximation and projection ===
Uniform manifold approximation and projection (UMAP) is a nonlinear dimensionality reduction technique.<ref>{{cite news | last1 = McInnes | first1 = Leland | last2 = Healy | first2 = John | last3 = Melville | first3 = James | title = Uniform manifold approximation and projection for dimension reduction | date = 2018-12-07 | arxiv = 1802.03426 }}</ref> It is similar to [[#t-distributed stochastic neighbor embedding|t-SNE]].<ref>{{Cite web|url=https://umap-learn.readthedocs.io/en/latest/|title=UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction — umap 0.3 documentation|website=umap-learn.readthedocs.io|access-date=2019-05-04}}</ref>