Editing Self-organizing map

{{Short description|Machine learning technique useful for dimensionality reduction}}
{{Machine learning|Artificial neural network}}
A '''self-organizing map''' ('''SOM''') or '''self-organizing feature map''' ('''SOFM''') is an [[unsupervised learning|unsupervised]] [[machine learning]] technique used to produce a [[dimensionality reduction|low-dimensional]] (typically two-dimensional) representation of a higher-dimensional data set while preserving the [[topology|topological structure]] of the data. For example, a data set with <math>p</math> variables measured in <math>n</math> observations could be represented as clusters of observations with similar values for the variables. These clusters then could be visualized as a two-dimensional "map" such that observations in proximal clusters have more similar values than observations in distal clusters. This can make high-dimensional data easier to visualize and analyze.

An SOM is a type of [[artificial neural network]] but is trained using [[competitive learning]] rather than the error-correction learning (e.g., [[backpropagation]] with [[gradient descent]]) used by other artificial neural networks. The SOM was introduced by the [[Finland|Finnish]] professor [[Teuvo Kohonen]] in the 1980s and therefore is sometimes called a '''Kohonen map''' or '''Kohonen network'''.<ref name="KohonenMap">{{cite journal |title= Kohonen Network |last1= Kohonen |first1= Teuvo |last2= Honkela |first2= Timo |year= 2007 |journal= Scholarpedia |volume= 2 |issue= 1 |pages= 1568 |doi= 10.4249/scholarpedia.1568 |bibcode= 2007SchpJ...2.1568K |doi-access= free }}</ref><ref>{{cite journal |last= Kohonen |first= Teuvo |year= 1982 |title= Self-Organized Formation of Topologically Correct Feature Maps |journal= Biological Cybernetics |volume= 43 |number= 1 |pages= 59–69 |doi= 10.1007/bf00337288|s2cid= 206775459 }}</ref> The Kohonen map or network is a computationally convenient abstraction building on biological models of neural systems from the 1970s<ref>{{cite journal | last1 = Von der Malsburg | first1 = C | year = 1973 | title = Self-organization of orientation sensitive cells in the striate cortex | journal = Kybernetik | volume = 14 | issue = 2| pages = 85–100 | doi=10.1007/bf00288907| pmid = 4786750 | s2cid = 3351573 }}</ref> and [[morphogenesis]] models dating back to [[Alan Turing]] in the 1950s.<ref>{{cite journal | last1 = Turing | first1 = Alan | year = 1952 | title = The chemical basis of morphogenesis | journal = Phil. Trans. R. Soc. | volume = 237 | issue = 641 | pages = 37–72 | doi=10.1098/rstb.1952.0012 | bibcode = 1952RSPTB.237...37T | doi-access =  }}</ref>
SOMs create internal representations reminiscent of the [[cortical homunculus]]{{Citation needed|date=September 2024}}, a distorted representation of the [[human body]], based on a neurological "map" of the areas and proportions of the [[human brain]] dedicated to processing [[Sensory processing|sensory function]]s, for different parts of the body.

[[File:Synapse Self-Organizing Map.png|thumb|right|300px|A self-organizing map showing [[United States Congress|U.S. Congress]] voting patterns. The input data was a table with a row for each member of Congress, and columns for certain votes containing each member's yes/no/abstain vote. The SOM algorithm arranged these members in a two-dimensional grid placing similar members closer together. '''The first plot''' shows the grouping when the data are split into two clusters. '''The second plot''' shows average distance to neighbours: larger distances are darker. '''The third plot''' predicts [[Republican Party (United States)|Republican]] (red) or [[Democratic Party (United States)|Democratic]] (blue) party membership. '''The other plots''' each overlay the resulting map with predicted values on an input dimension: red means a predicted 'yes' vote on that bill, blue means a 'no' vote. The plot was created in [[Peltarion Synapse|Synapse]].]]<!--
-->

== Overview ==
Self-organizing maps, like most artificial neural networks, operate in two modes: training and mapping. First, training uses an input data set (the "input space") to generate a lower-dimensional representation of the input data (the "map space"). Second, mapping classifies additional input data using the generated map.

In most cases, the goal of training is to represent an input space with ''p'' dimensions as a map space with two dimensions. Specifically, an input space with ''p'' variables is said to have ''p'' dimensions. A map space consists of components called "nodes" or "neurons", which are arranged as a [[hexagonal]] or [[rectangular]] grid with two dimensions.<ref>{{cite web |url=http://users.ics.aalto.fi/jhollmen/dippa/node9.html |author=Jaakko Hollmen |date=9 March 1996 |title=Self-Organizing Map (SOM) |website=[[Aalto University]]}}</ref> The number of nodes and their arrangement are specified beforehand based on the larger goals of the analysis and [[exploratory data analysis|exploration of the data]].

Each node in the map space is associated with a "weight" vector, which is the position of the node in the input space. While nodes in the map space stay fixed, training consists in moving weight vectors toward the input data (reducing a distance metric such as [[Euclidean distance]]) without spoiling the topology induced from the map space. After training, the map can be used to classify additional observations for the input space by finding the node with the closest weight vector (smallest distance metric) to the input space vector.

== Learning algorithm ==
The goal of learning in the self-organizing map is to cause different parts of the network to respond similarly to certain input patterns. This is partly motivated by how visual, auditory or other [[sense|sensory]] information is handled in separate parts of the [[cerebral cortex]] in the [[human brain]].<ref name="Haykin">{{cite book |first=Simon |last=Haykin |title=Neural networks - A comprehensive foundation |chapter=9. Self-organizing maps |edition=2nd |publisher=Prentice-Hall |year=1999 |isbn=978-0-13-908385-3 }}</ref>

[[Image:Somtraining.svg|thumb|500px|An illustration of the training of a self-organizing map. The blue blob is the distribution of the training data, and the small white disc is the current training datum drawn from that distribution. At first (left) the SOM nodes are arbitrarily positioned in the data space. The node (highlighted in yellow) which is nearest to the training datum is selected. It is moved towards the training datum, as (to a lesser extent) are its neighbors on the grid. After many iterations the grid tends to approximate the data distribution (right).]]<!--
-->
The weights of the neurons are initialized either to small random values or sampled evenly from the subspace spanned by the two largest [[principal component]] [[eigenvectors]]. With the latter alternative, learning is much faster because the initial weights already give a good approximation of SOM weights.<ref name="SOMIntro">{{cite web |title=Intro to SOM |first=Teuvo |last=Kohonen |work=SOM Toolbox |url=http://www.cis.hut.fi/projects/somtoolbox/theory/somalgorithm.shtml |year=2005<!-- last updated 18 March 2005 --> |access-date=2006-06-18 }}</ref>

The network must be fed a large number of example vectors that represent, as close as possible, the kinds of vectors expected during mapping. The examples are usually administered several times as iterations.

The training utilizes [[competitive learning]]. When a training example is fed to the network, its [[Euclidean distance]] to all weight vectors is computed. The neuron whose weight vector is most similar to the input is called the '''best matching unit''' (BMU). The weights of the BMU and neurons close to it in the SOM grid are adjusted towards the input vector. The magnitude of the change decreases with time and with the grid-distance from the BMU. The update formula for a neuron v with weight vector '''W<sub>v</sub>'''(s) is
:<math>W_{v}(s + 1) = W_{v}(s) + \theta(u, v, s) \cdot \alpha(s) \cdot (D(t) - W_{v}(s))</math>,
where ''s'' is the step index, ''t'' is an index into the training sample, ''u'' is the index of the BMU for the input vector '''D'''(''t''), ''α''(''s'') is a [[monotonically decreasing]] learning coefficient; ''θ''(''u'', ''v'', ''s'') is the [[Neighbourhood (mathematics)|neighborhood]] function which gives the distance between the neuron u and the neuron ''v'' in step ''s''.<ref name="Scholarpedia">{{cite journal|title=Kohonen network|last1=Kohonen|first1=Teuvo|last2=Honkela|first2=Timo|year=2011<!-- last approved revision 2011-11-15 -->|journal=Scholarpedia|volume=2|issue=1|pages=1568|doi=10.4249/scholarpedia.1568|doi-access=free|bibcode=2007SchpJ...2.1568K}}<!--
Begin Quote
Consider first data items that are n-dimensional Euclidean vectors x(t)=[ξ1(t),ξ2(t),…,ξn(t)]. Here t is the index of the data item in a given sequence. Let the ith model be mi(t)=[μi1(t),μi2(t),…,μin(t)], where now t denotes the index in the sequence in which the models are generated.
End Quote
The equation mi(t+1)=mi(t)+α(t)hci(t)[x(t)−mi(t)] thus uses the symbol t to mean *two different things*: the t of x(t) is not the t of m, α and h. This is why we use s and t here.

Ultsch & Siemon 1990 also use three nested loops when describing Kohonen's algorithm: the outer one is over the training steps (and controls the decay of θ and α (called n and η, respectively, in their paper)), the middle one is over the data items, and the inner is over the neurons.
--></ref> Depending on the implementations, t can scan the training data set systematically (''t'' is 0, 1, 2...''T''-1, then repeat, ''T'' being the training sample's size), be randomly drawn from the data set ([[bootstrap sampling]]), or implement some other sampling method (such as [[Jackknife_resampling|jackknifing]]).

The neighborhood function ''θ''(''u'', ''v'', ''s'') (also called ''function of lateral interaction'') depends on the grid-distance between the BMU (neuron ''u'') and neuron ''v''. In the simplest form, it is 1 for all neurons close enough to BMU and 0 for others, but the [[Gaussian function|Gaussian]] and [[mexican hat wavelet|Mexican-hat]]<ref name="Vrieze">{{cite book |last1=Vrieze |first1=O.J. |title=Artificial Neural Networks |chapter=Kohonen Network |chapter-url=https://link.springer.com/content/pdf/10.1007%2FBFb0027024.pdf |website=Springer |series=Lecture Notes in Computer Science |year=1995 |volume=931 |pages=83–100 |publisher=University of Limburg, Maastricht |doi=10.1007/BFb0027024 |isbn=978-3-540-59488-8 |access-date=1 July 2020 |ref=Vrieze}}</ref> functions are common choices, too. Regardless of the functional form, the neighborhood function shrinks with time.<ref name="Haykin" /> At the beginning when the neighborhood is broad, the self-organizing takes place on the global scale. When the neighborhood has shrunk to just a couple of neurons, the weights are converging to local estimates. In some implementations, the learning coefficient ''α'' and the neighborhood function ''θ'' decrease steadily with increasing ''s'', in others (in particular those where ''t'' scans the training data set) they decrease in step-wise fashion, once every ''T'' steps.

[[File:TrainSOM.gif|thumb|Training process of SOM on a two-dimensional data set]]

This process is repeated for each input vector for a (usually large) number of cycles '''λ'''. The network winds up associating output nodes with groups or patterns in the input data set. If these patterns can be named, the names can be attached to the associated nodes in the trained net.

During mapping, there will be one single ''winning'' neuron: the neuron whose weight vector lies closest to the input vector. This can be simply determined by calculating the Euclidean distance between input vector and weight vector.

While representing input data as vectors has been emphasized in this article, any kind of object which can be represented digitally, which has an appropriate distance measure associated with it, and in which the necessary operations for training are possible can be used to construct a self-organizing map. This includes matrices, continuous functions or even other self-organizing maps.

=== Algorithm ===
# Randomize the  node weight vectors in a map
# For <math>s = 0, 1, 2, ..., \lambda</math>
## Randomly pick an input vector <math>{D}(t)</math>
## Find the node in the map closest to the input vector. This node is the '''best matching unit''' (BMU). Denote it by <math>u</math>
## For each node <math>v</math>, update its vector by pulling it closer to the input vector: <math display="block">W_{v}(s + 1) = W_{v}(s) + \theta(u, v, s) \cdot \alpha(s) \cdot (D(t) - W_{v}(s))  </math>

The variable names mean the following, with vectors in bold,
* <math>s</math> is the current iteration
* <math>\lambda</math> is the iteration limit
* <math>t</math> is the index of the target input data vector in the input data set <math>\mathbf{D}</math>
* <math>{D}(t)</math> is a target input data vector
* <math>v</math> is the index of the node in the map
* <math>\mathbf{W}_v</math> is the current weight vector of node <math>v</math>
* <math>u</math> is the index of the best matching unit (BMU) in the map
* <math>\theta (u, v, s)</math> is the neighbourhood function,
* <math>\alpha (s)</math> is the learning rate schedule.
The key design choices are the shape of the SOM, the neighbourhood function, and the learning rate schedule. The idea of the neighborhood function is to make it such that the BMU is updated the most, its immediate neighbors are updated a little less, and so on. The idea of the learning rate schedule is to make it so that the map updates are large at the start, and gradually stop updating.

For example, if we want to learn a SOM using a square grid, we can index it using <math>(i, j)</math> where both <math>i, j \in 1:N</math>. The neighborhood function can make it so that the BMU updates in full, the nearest neighbors update in half, and their neighbors update in half again, etc.<math display="block">\theta((i, j), (i', j'), s) = \frac{1}{2^{|i-i'| + |j-j'|}} = \begin{cases}
1 & \text{if }i=i', j = j' \\
1/2 & \text{if }|i-i'| + |j-j'| = 1 \\
1/4 & \text{if }|i-i'| + |j-j'| = 2 \\
\cdots & \cdots
\end{cases} 
</math>And we can use a simple linear learning rate schedule <math>\alpha(s) = 1-s/\lambda</math>.

Notice in particular, that the update rate does ''not'' depend on where the point is in the Euclidean space, only on where it is in the SOM itself. For example, the points <math>(1,1), (1,2) </math> are close on the SOM, so they will always update in similar ways, even when they are far apart on the Euclidean space. In contrast, even if the points <math>(1,1), (1, 100)</math> end up overlapping each other (such as if the SOM looks like a folded towel), they still do not update in similar ways.

=== Alternative algorithm ===
# Randomize the map's nodes' weight vectors
# Traverse each input vector in the input data set
## Traverse each node in the map
### Use the [[Euclidean distance]] formula to find the similarity between the input vector and the map's node's weight vector
### Track the node that produces the smallest distance (this node is the best matching unit, BMU)
## Update the nodes in the neighborhood of the BMU (including the BMU itself) by pulling them closer to the input vector
### <math>W_{v}(s + 1) = W_{v}(s) + \theta(u, v, s) \cdot \alpha(s) \cdot (D(t) - W_{v}(s))</math>
# Increase <math>s</math> and repeat from step 2 while <math>s < \lambda</math>

=== Initialization options ===
Selection of initial weights as good approximations of the final weights is a well-known problem for all iterative methods of artificial neural networks, including self-organizing maps. Kohonen originally proposed random initiation of weights.<ref>{{cite book |first=T. |last=Kohonen |title=Self-Organization and Associative Memory |publisher=Springer |orig-year=1988 |edition=2nd |isbn= 978-3-662-00784-6 |year=2012}}</ref> (This approach is reflected by the algorithms described above.) More recently, principal component initialization, in which initial map weights are chosen from the space of the first principal components, has become popular due to the exact reproducibility of the results.<ref>{{cite conference |first1=A. |last1=Ciampi |first2=Y. |last2=Lechevallier |title=Clustering large, multi-level data sets: An approach based on Kohonen self organizing maps |editor-first=D.A. |editor-last=Zighed |editor2-first=J. |editor2-last=Komorowski |editor3-first=J. |editor3-last=Zytkow  |date=2000 |publisher=Springer |volume=1910 |pages=353–358 |doi=10.1007/3-540-45372-5_36 |book-title=Principles of Data Mining and Knowledge Discovery: 4th European Conference, PKDD 2000 Lyon, France, September 13–16, 2000 Proceedings |series=Lecture notes in computer science |isbn=3-540-45372-5|doi-access=free }}</ref>

[[File:Self oraganizing map cartography.jpg|thumb|Cartographical representation of a self-organizing map ([[U-Matrix]]) based on Wikipedia featured article data (word frequency). Distance is inversely proportional to similarity. The "mountains" are edges between clusters. The red lines are links between articles.]]

A careful comparison of random initialization to principal component initialization for a one-dimensional map, however, found that the advantages of principal component initialization are not universal. The best initialization method depends on the geometry of the specific dataset. Principal component initialization was preferable (for a one-dimensional map) when the principal curve approximating the dataset could be univalently and linearly projected on the first principal component (quasilinear sets). For nonlinear datasets, however, random initiation performed better.<ref>{{cite journal | last1 = Akinduko | first1 = A.A. | last2 = Mirkes | first2 = E.M. | last3 = Gorban | first3 = A.N. | year = 2016 | title = SOM: Stochastic initialization versus principal components | url = https://www.researchgate.net/publication/283768202 | journal = Information Sciences | volume =  364–365| pages =  213–221| doi = 10.1016/j.ins.2015.10.013 }}</ref>

== Interpretation ==
[[File:SOMsPCA.PNG|thumb|One-dimensional SOM versus principal component analysis (PCA) for data approximation. SOM is a red [[broken line]] with squares, 20 nodes. The first principal component is presented by a blue line. Data points are the small grey circles. For PCA, the [[fraction of variance unexplained]] in this example is 23.23%, for SOM it is 6.86%.<ref>Illustration is prepared using free software: Mirkes, Evgeny M.; [http://www.math.le.ac.uk/people/ag153/homepage/PCA_SOM/PCA_SOM.html ''Principal Component Analysis and Self-Organizing Maps: applet''], University of Leicester, 2011</ref>]]

There are two ways to interpret a SOM. Because in the training phase weights of the whole neighborhood are moved in the same direction, similar items tend to excite adjacent neurons. Therefore, SOM forms a semantic map where similar samples are mapped close together and dissimilar ones apart. This may be visualized by a [[U-Matrix]] (Euclidean distance between weight vectors of neighboring cells) of the SOM.<ref name="UltschSiemon1990">{{cite book |first1= Alfred |last1= Ultsch |first2= H. Peter |last2= Siemon |chapter= Kohonen's Self Organizing Feature Maps for Exploratory Data Analysis |title= Proceedings of the International Neural Network Conference (INNC-90), Paris, France, July 9–13, 1990 |pages= [https://archive.org/details/innc90parisinter0001inte/page/305 305–308] |editor1-first= Bernard |editor1-last= Widrow |editor2-first= Bernard |editor2-last= Angeniol |publisher= Kluwer |location= Dordrecht, Netherlands |year= 1990 |volume= 1 |isbn= 978-0-7923-0831-7 |chapter-url= http://www.uni-marburg.de/fb12/datenbionik/pdf/pubs/1990/UltschSiemon90 |url= https://archive.org/details/innc90parisinter0001inte/page/305 }}</ref><ref name="Ultsch2003">{{cite tech report |last=Ultsch |first=Alfred |year=2003 |title=U*-Matrix: A tool to visualize clusters in high dimensional data |publisher=Department of Computer Science, University of Marburg |url=http://www.uni-marburg.de/fb12/datenbionik/pdf/pubs/2003/ultsch03ustar |id=36 |pages=1-12}}</ref><ref>{{cite conference |last1=Saadatdoost |first1=Robab |first2=Alex Tze Hiang |last2=Sim |last3=Jafarkarimi |first3=Hosein |title=Application of self organizing map for knowledge discovery based in higher education data |book-title=Research and Innovation in Information Systems (ICRIIS), 2011 International Conference on |publisher=IEEE |date=2011 |doi=10.1109/ICRIIS.2011.6125693 |isbn=978-1-61284-294-3}}</ref>

The other way is to think of neuronal weights as pointers to the input space. They form a discrete approximation of the distribution of training samples. More neurons point to regions with high training sample concentration and fewer where the samples are scarce.

SOM may be considered a nonlinear generalization of [[Principal components analysis]] (PCA).<ref>{{cite book |last=Yin |first=Hujun |chapter=Learning Nonlinear Principal Manifolds by Self-Organising Maps |title={{harvnb|Gorban|Kégl|Wunsch|Zinovyev|2008}}}}</ref> It has been shown, using both artificial and real geophysical data, that SOM has many advantages<ref>{{cite journal | last1 = Liu | first1 = Yonggang | last2 = Weisberg | first2 = Robert H | year = 2005 | title = Patterns of Ocean Current Variability on the West Florida Shelf Using the Self-Organizing Map | journal = Journal of Geophysical Research | volume = 110 | issue = C6 | page = C06003 | doi = 10.1029/2004JC002786 | bibcode = 2005JGRC..110.6003L | doi-access = free }}</ref><ref>{{cite journal | last1 = Liu | first1 = Yonggang | last2 = Weisberg | first2 = Robert H. | last3 = Mooers | first3 = Christopher N. K. | year = 2006 | title = Performance Evaluation of the Self-Organizing Map for Feature Extraction | journal = Journal of Geophysical Research | volume = 111 | issue = C5 | page = C05018 | doi = 10.1029/2005jc003117 | bibcode = 2006JGRC..111.5018L | doi-access = free }}</ref> over the conventional [[feature extraction]] methods such as Empirical Orthogonal Functions (EOF) or PCA. Additionally, researchers found that Clustering and PCA reflect different facets of the same local feedback circuit of human brain, with the SOM providing the shared learning rules that guide both processes. In other words, Clustering and PCA synergize via SOM. <ref>Liu, C., Bowen, E. F. W., & Granger, R. (2025). A formal relation between two disparate mathematical algorithms is ascertained from biological circuit analyses. bioRxiv. https://doi.org/10.1101/2025.03.28.645962</ref>

Originally, SOM was not formulated as a solution to an optimisation problem. Nevertheless, there have been several attempts to modify the definition of SOM and to formulate an optimisation problem which gives similar results.<ref>{{cite book |last=Heskes |first=Tom |chapter=Energy Functions for Self-Organizing Maps |editor-last=Oja |editor-first=Erkki |editor2-last=Kaski |editor2-first=Samuel |title=Kohonen Maps |publisher=Elsevier |date=1999 |pages=303–315 |doi=10.1016/B978-044450270-4/50024-3 |isbn=978-044450270-4}}</ref> For example, [[Elastic map]]s use the mechanical metaphor of elasticity to approximate [[Nonlinear dimensionality reduction#Principal curves and manifolds|principal manifolds]]:<ref>{{cite book |editor-link=Alexander Nikolaevich Gorban |editor-last=Gorban |editor-first=Alexander N.  |editor2-last=Kégl |editor2-first=Balázs |editor3-last=Wunsch |editor3-first=Donald C. |editor4-last=Zinovyev |editor4-first=Andrei |url=https://www.researchgate.net/publication/271642170  |title=Principal Manifolds for Data Visualization and Dimension Reduction |series=Lecture Notes in Computer Science and Engineering  |volume=58 |publisher=Springer |date=2008 |isbn=978-3-540-73749-0}}</ref> the analogy is an elastic membrane and plate.

==Examples==

* Banking system financial analysis<ref>{{Cite journal|last1= Franch|first1=F.|date=2014|title=Correspondent Banking in Euro: bank clustering via self-organizing maps|url=https://unsworks.unsw.edu.au/bitstreams/c51e3a02-496b-469f-a063-0ebdff9b6cc5/download |journal=J. Of Financial Market Infrastructures|series=|volume=2|issue=4|pages=3–20 |issn= 2049-5404|doi=10.21314/JFMI.2014.030}}</ref><ref>{{Cite journal|last1=Ha|first1=Man|last2=Gan|first2=Christopher|last3= Nguyen |first3= Cuong|last4= Anthony |first4=Patricia|date=13 October 2021|title=Self-Organising (Kohonen) Maps for the Vietnam Banking Industry|journal=J. Risk Financial Manag.|series=|volume=14|issue=10|page=485 | doi= 10.3390/jrfm14100485|doi-access=free |hdl=10419/258589|hdl-access=free}}</ref>
* Financial investment<ref>{{Cite journal|last1= Li |first1= S. T.|last2=Kuo|first2= S. C. |date=February 2008|title=Knowledge discovery in financial investment for forecasting and trading strategy through wavelet-based SOM networks |url= https://doi.org/10.1016/j.eswa.2006.10.039 |journal= Expert Systems with Applications |series=|volume=34|issue=2|pages= 935–951| doi= 10.1016/j.eswa.2006.10.039|url-access= subscription}}</ref><ref>{{Cite journal|last1=Hsu|first1= Chih-Ming|date=October 2011|title= A hybrid procedure for stock price prediction by integrating self-organizing map and genetic programming |url= https://doi.org/10.1016/j.eswa.2011.04.210|journal= Expert Systems with Applications |series=|volume=38|issue=11| doi= 10.1016/j.eswa.2011.04.210|url-access=subscription}}</ref>
* Project prioritization and selection<ref>{{cite journal |last1=Zheng |first1=G. |last2=Vaishnavi |first2=V. |date=2011 |url=http://www.slideshare.net/jgzheng/multidimensional-perceptual-map |title=A Multidimensional Perceptual Map Approach to Project Prioritization and Selection |journal=AIS Transactions on Human-Computer Interaction |volume=3 |issue=2 |pages=82–103|doi=10.17705/1thci.00028 |doi-access=free }}</ref>
* Seismic facies analysis for oil and gas exploration<ref>{{cite book | last1 = Taner | first1 = M. T. | last2 = Walls | first2 = J. D. | last3 = Smith | first3 = M. | last4 = Taylor | first4 = G. | last5 = Carr | first5 = M. B. | last6 = Dumas | first6 = D. | year = 2001 | title = SEG Technical Program Expanded Abstracts 2001| volume = 2001 | pages = 1552–1555 |doi= 10.1190/1.1816406| chapter = Reservoir characterization by calibration of self-organized map clusters | s2cid = 59155082 }}</ref>
* [[Failure mode and effects analysis]]<ref>{{cite journal|last1=Chang|first1=Wui Lee  |last2=Pang|first2=Lie Meng |last3=Tay |first3=Kai Meng|date=March 2017|title=Application of Self-Organizing Map to Failure Modes and Effects Analysis Methodology |journal=Neurocomputing |volume=249 |pages=314–320 |doi=10.1016/j.neucom.2016.04.073|url=http://ir.unimas.my/15892/7/Application%20of%20self-organizing%20map%20to%20failure%20modes%20%28abstract%29.pdf }}</ref>
*Finding representative data in large datasets
**representative species for ecological communities<ref>{{Cite journal|last1=Park|first1=Young-Seuk|last2=Tison|first2=Juliette|last3=Lek|first3=Sovan|last4=Giraudel|first4=Jean-Luc|last5=Coste|first5=Michel|last6=Delmas|first6=François|date=2006-11-01|title=Application of a self-organizing map to select representative species in multivariate analysis: A case study determining diatom distribution patterns across France|url=https://www.sciencedirect.com/science/article/pii/S1574954106000525|journal=Ecological Informatics|series=4th International Conference on Ecological Informatics|language=en|volume=1|issue=3|pages=247–257|doi=10.1016/j.ecoinf.2006.03.005|bibcode=2006EcInf...1..247P |issn=1574-9541|url-access=subscription}}</ref>
**representative days for energy system models<ref>{{Cite journal|last1=Yilmaz|first1=Hasan Ümitcan|last2=Fouché|first2=Edouard|last3=Dengiz|first3=Thomas|last4=Krauß|first4=Lucas|last5=Keles|first5=Dogan|last6=Fichtner|first6=Wolf|date=2019-04-01|title=Reducing energy time series for energy system models via self-organizing maps|url=https://www.degruyter.com/document/doi/10.1515/itit-2019-0025/html|journal=It - Information Technology|language=en|volume=61|issue=2–3|pages=125–133|doi=10.1515/itit-2019-0025|s2cid=203160544 |issn=2196-7032|url-access=subscription}}</ref>

== Alternative approaches ==

* The '''[[generative topographic map]]''' (GTM) is a potential alternative to SOMs. In the sense that a GTM explicitly requires a smooth and continuous mapping from the input space to the map space, it is topology preserving. However, in a practical sense, this measure of topological preservation is lacking.<ref>{{cite journal |last=Kaski |first=Samuel |title=Data Exploration Using Self-Organizing Maps |journal=Acta Polytechnica Scandinavica |series=Mathematics, Computing and Management in Engineering Series |volume=82 |year=1997 |publisher=Finnish Academy of Technology |location=Espoo, Finland |isbn=978-952-5148-13-8}}</ref>
* The '''[[growing self-organizing map]]''' (GSOM) is a growing variant of the self-organizing map. The GSOM was developed to address the issue of identifying a suitable map size in the SOM. It starts with a minimal number of nodes (usually four) and grows new nodes on the boundary based on a heuristic. By using a value called the ''spread factor'', the data analyst has the ability to control the growth of the GSOM.<ref>{{cite journal |last1=Alahakoon |first1=D. |last2=Halgamuge |first2=S.K. |last3=Sirinivasan |first3=B. |year=2000 |title=Dynamic Self Organizing Maps With Controlled Growth for Knowledge Discovery |journal=IEEE Transactions on Neural Networks |volume=11 |issue=3 |pages=601–614 |pmid=18249788 |doi=10.1109/72.846732}}</ref>
* The '''conformal map''' approach uses conformal mapping to interpolate each training sample between grid nodes in a continuous surface. A one-to-one smooth mapping is possible in this approach.<ref>{{cite journal | last1=Liou | first1=C.-Y. | last2=Tai | first2=W.-P. | title=Conformality in the self-organization network |journal=Artificial Intelligence |volume=116 | issue=1–2 |pages=265–286 |date=2000 |doi=10.1016/S0004-3702(99)00093-4  | doi-access= }}</ref><ref>{{cite journal | last1=Liou | first1=C.-Y. | last2=Kuo | first2=Y.-T. | title=Conformal Self-organizing Map for a Genus Zero Manifold |journal=The Visual Computer |volume=21 |issue=5 |pages=340–353 |date=2005 |doi=10.1007/s00371-005-0290-6 | s2cid=8677589 }}</ref>
* The '''time adaptive self-organizing map''' (TASOM) network is an extension of the basic SOM. The TASOM employs adaptive learning rates and neighborhood functions. It also includes a scaling parameter to make the network invariant to scaling, translation and rotation of the input space. The TASOM and its variants have been used in several applications including adaptive clustering, multilevel thresholding, input space approximation, and active contour modeling.<ref>{{cite journal |first1=Hamed |last1=Shah-Hosseini |first2=Reza |last2=Safabakhsh |title=TASOM: A New Time Adaptive Self-Organizing Map |journal=IEEE Transactions on Systems, Man, and Cybernetics - Part B: Cybernetics |volume=33 |number=2 |date=April 2003 |pages=271–282 |doi=10.1109/tsmcb.2003.810442|pmid=18238177 }}</ref> Moreover, a Binary Tree TASOM or BTASOM, resembling a binary natural tree having nodes composed of TASOM networks has been proposed where the number of its levels and the number of its nodes are adaptive with its environment.<ref>{{cite journal |first=Hamed |last=Shah-Hosseini |title=Binary Tree Time Adaptive Self-Organizing Map |journal=Neurocomputing |volume=74 |number=11 |date=May 2011 |pages=1823–1839 |doi=10.1016/j.neucom.2010.07.037}}</ref>
* The '''[[elastic map]]''' approach borrows from the [[spline interpolation]] the idea of minimization of the [[elastic energy]]. In learning, it minimizes the sum of quadratic bending and stretching energy with the [[least squares]] [[approximation error]].<ref>{{cite journal |first1=A.N. |last1=Gorban |first2=A. |last2=Zinovyev |arxiv=1001.1122 |title=Principal manifolds and graphs in practice: from molecular biology to dynamical systems] |journal=[[International Journal of Neural Systems]] |volume=20 |issue=3 |date=2010 |pages=219–232 |doi=10.1142/S0129065710002383|pmid=20556849 |s2cid=2170982 }}</ref>
* The '''oriented and scalable map''' (OS-Map) generalises the neighborhood function and the winner selection.<ref>{{cite journal | last1 = Hua | first1 = H | year = 2016 | title = Image and geometry processing with Oriented and Scalable Map | journal = Neural Networks | volume = 77 | pages = 1–6 | doi = 10.1016/j.neunet.2016.01.009 | pmid = 26897100 }}</ref> The homogeneous Gaussian neighborhood function is replaced with the matrix exponential. Thus one can specify the orientation either in the map space or in the data space. SOM has a fixed scale (=1), so that the maps "optimally describe the domain of observation". But what about a map covering the domain twice or in n-folds? This entails the conception of scaling. The OS-Map regards the scale as a statistical description of how many best-matching nodes an input has in the map.

== See also ==
* [[Deep learning]]
* [[Hybrid Kohonen self-organizing map]]
* [[Learning vector quantization]]
* [[Liquid state machine]]
* [[Neocognitron]]
* [[Neural gas]]
* [[Sparse coding]]
* [[Sparse distributed memory]]
* [[Topological data analysis]]

== Further reading ==

* {{Cite journal |last=Kohonen |first=Teuvo |date=January 2013 |title=Essentials of the self-organizing map |url=https://linkinghub.elsevier.com/retrieve/pii/S0893608012002596 |journal=Neural Networks |language=en |volume=37 |pages=52–65 |doi=10.1016/j.neunet.2012.09.018|pmid=23067803 |s2cid=17289060 |url-access=subscription }}
* {{Cite book |last=Kohonen |first=Teuvo |title=Self-organizing maps: with 22 tables |date=2001 |publisher=Springer |isbn=978-3-540-67921-9 |edition=3 |series=Springer Series in Information Sciences |location=Berlin Heidelberg}}
* {{Cite journal |last=Kohonen |first=Teuvo |date=1988 |title=Self-Organization and Associative Memory |url=https://link.springer.com/book/10.1007/978-3-662-00784-6 |journal=Springer Series in Information Sciences |volume=8 |language=en |doi=10.1007/978-3-662-00784-6 |isbn=978-3-540-18314-3 |issn=0720-678X|url-access=subscription }}
* Kaski, Samuel, Jari Kangas, and Teuvo Kohonen. "[http://cis.legacy.ics.tkk.fi/research/som-bibl/vol1_4.pdf Bibliography of self-organizing map (SOM) papers: 1981–1997]." ''Neural computing surveys'' 1.3&4 (1998): 1-176.
* Oja, Merja, Samuel Kaski, and Teuvo Kohonen. "[https://www.researchgate.net/profile/Samuel-Kaski/publication/2605370_Bibliography_of_Self-Organizing_Map_SOM_Papers_1998-2001_Addendum/links/00b7d51cdba36cca11000000/Bibliography-of-Self-Organizing-Map-SOM-Papers-1998-2001-Addendum.pdf Bibliography of self-organizing map (SOM) papers: 1998–2001 addendum]." ''Neural computing surveys'' 3.1 (2003): 1-156.

== References ==
{{Reflist}}{{Reflist|group=Note}}

==External links==
*{{Commonscat inline}}

{{Authority control}}

{{DEFAULTSORT:Self-Organizing Map}}
[[Category:Self-organization]]
[[Category:Artificial neural networks]]
[[Category:Dimension reduction]]
[[Category:Cluster analysis algorithms]]
[[Category:Finnish inventions]]
[[Category:Unsupervised learning]]