Editing Self-organizing map (section)

=== Algorithm ===
# Randomize the  node weight vectors in a map
# For <math>s = 0, 1, 2, ..., \lambda</math>
## Randomly pick an input vector <math>{D}(t)</math>
## Find the node in the map closest to the input vector. This node is the '''best matching unit''' (BMU). Denote it by <math>u</math>
## For each node <math>v</math>, update its vector by pulling it closer to the input vector: <math display="block">W_{v}(s + 1) = W_{v}(s) + \theta(u, v, s) \cdot \alpha(s) \cdot (D(t) - W_{v}(s))  </math>

The variable names mean the following, with vectors in bold,
* <math>s</math> is the current iteration
* <math>\lambda</math> is the iteration limit
* <math>t</math> is the index of the target input data vector in the input data set <math>\mathbf{D}</math>
* <math>{D}(t)</math> is a target input data vector
* <math>v</math> is the index of the node in the map
* <math>\mathbf{W}_v</math> is the current weight vector of node <math>v</math>
* <math>u</math> is the index of the best matching unit (BMU) in the map
* <math>\theta (u, v, s)</math> is the neighbourhood function,
* <math>\alpha (s)</math> is the learning rate schedule.
The key design choices are the shape of the SOM, the neighbourhood function, and the learning rate schedule. The idea of the neighborhood function is to make it such that the BMU is updated the most, its immediate neighbors are updated a little less, and so on. The idea of the learning rate schedule is to make it so that the map updates are large at the start, and gradually stop updating.

For example, if we want to learn a SOM using a square grid, we can index it using <math>(i, j)</math> where both <math>i, j \in 1:N</math>. The neighborhood function can make it so that the BMU updates in full, the nearest neighbors update in half, and their neighbors update in half again, etc.<math display="block">\theta((i, j), (i', j'), s) = \frac{1}{2^{|i-i'| + |j-j'|}} = \begin{cases}
1 & \text{if }i=i', j = j' \\
1/2 & \text{if }|i-i'| + |j-j'| = 1 \\
1/4 & \text{if }|i-i'| + |j-j'| = 2 \\
\cdots & \cdots
\end{cases} 
</math>And we can use a simple linear learning rate schedule <math>\alpha(s) = 1-s/\lambda</math>.

Notice in particular, that the update rate does ''not'' depend on where the point is in the Euclidean space, only on where it is in the SOM itself. For example, the points <math>(1,1), (1,2) </math> are close on the SOM, so they will always update in similar ways, even when they are far apart on the Euclidean space. In contrast, even if the points <math>(1,1), (1, 100)</math> end up overlapping each other (such as if the SOM looks like a folded towel), they still do not update in similar ways.