Editing Hebbian theory (section)

==Relationship to unsupervised learning, stability, and generalization==
Because of the simple nature of Hebbian learning, based only on the coincidence of pre- and post-synaptic activity, it may not be intuitively clear why this form of plasticity leads to meaningful learning. However, it can be shown that Hebbian plasticity does pick up the statistical properties of the input in a way that can be categorized as unsupervised learning.

This can be mathematically shown in a simplified example. Let us work under the simplifying assumption of a single rate-based neuron of rate <math>y(t)</math>, whose inputs have rates <math>x_1(t) ... x_N(t)</math>. The response of the neuron <math>y(t)</math> is usually described as a linear combination of its input, <math>\sum_i w_ix_i</math>, followed by a [[Activation function|response function]] <math>f</math>:
:<math>y = f\left(\sum_{i=1}^N w_i x_i \right).</math>
As defined in the previous sections, Hebbian plasticity describes the evolution in time of the synaptic weight <math>w</math>:
:<math>\frac{dw_i}{dt} = \eta x_i y.</math>
Assuming, for simplicity, an identity response function <math>f(a)=a</math>, we can write
:<math>\frac{dw_i}{dt} = \eta x_i \sum_{j=1}^N w_j x_j</math>
or in [[Matrix (mathematics)|matrix]] form:
:<math>\frac{d\mathbf{w}}{dt} = \eta \mathbf{x}\mathbf{x}^T\mathbf{w}.</math>
As in the previous chapter, if training by epoch is done an average <math>\langle \dots \rangle</math> over discrete or continuous (time) training set of <math>\mathbf{x}</math> can be done:<math display="block">\frac{d\mathbf{w}}{dt} = \langle \eta \mathbf{x}\mathbf{x}^T\mathbf{w} \rangle = \eta \langle \mathbf{x}\mathbf{x}^T\rangle\mathbf{w} = \eta C \mathbf{w}.</math>where <math>C = \langle\, \mathbf{x}\mathbf{x}^T \rangle</math> is the [[correlation matrix]] of the input under the additional assumption that <math>\langle\mathbf{x}\rangle = 0</math> (i.e. the average of the inputs is zero). This is a system of <math>N</math> coupled linear differential equations. Since <math>C</math> is [[Symmetric matrix|symmetric]], it is also [[diagonalizable matrix|diagonalizable]], and the solution can be found, by working in its eigenvectors basis, to be of the form
:<math>\mathbf{w}(t) = k_1e^{\eta\alpha_1 t}\mathbf{c}_1 + k_2e^{\eta\alpha_2 t}\mathbf{c}_2 + ... + k_Ne^{\eta\alpha_N t}\mathbf{c}_N</math>
where <math>k_i</math> are arbitrary constants, <math>\mathbf{c}_i</math> are the eigenvectors of <math>C</math> and <math>\alpha_i</math> their corresponding eigen values. 
Since a correlation matrix is always a [[positive-definite matrix]], the eigenvalues are all positive, and one can easily see how the above solution is always exponentially divergent in time.
This is an intrinsic problem due to this version of Hebb's rule being unstable, as in any network with a dominant signal the synaptic weights will increase or decrease exponentially. Intuitively, this is because whenever the presynaptic neuron excites the postsynaptic neuron, the weight between them is reinforced, causing an even stronger excitation in the future, and so forth, in a self-reinforcing way. One may think a solution is to limit the firing rate of the postsynaptic neuron by adding a non-linear, saturating response function <math>f</math>, but in fact, it can be shown that for ''any'' neuron model, Hebb's rule is unstable.<ref>{{cite web|url=http://www.cnel.ufl.edu/courses/EEL6814/chapter6.pdf |title=Neural and Adaptive Systems: Fundamentals Through Simulations |access-date=2016-03-16 |last=Euliano |first=Neil R. | date=1999-12-21 |publisher= Wiley |archive-url=https://web.archive.org/web/20151225094329/http://www.cnel.ufl.edu/courses/EEL6814/chapter6.pdf |archive-date=2015-12-25}}</ref> Therefore, network models of neurons usually employ other learning theories such as [[BCM theory]], [[Oja's rule]],<ref>{{cite web|url=http://nba.uth.tmc.edu/homepage/shouval/Hebb_PCA.ppt |title=The Physics of the Brain |access-date=2007-11-14 |last=Shouval |first=Harel |date=2005-01-03 |work=The Synaptic basis for Learning and Memory: A theoretical approach |publisher=The University of Texas Health Science Center at Houston |archive-url = https://web.archive.org/web/20070610134104/http://nba.uth.tmc.edu/homepage/shouval/Hebb_PCA.ppt |archive-date = 2007-06-10}}</ref> or the [[generalized Hebbian algorithm]].

Regardless, even for the unstable solution above, one can see that, when sufficient time has passed, one of the terms dominates over the others, and
:<math>\mathbf{w}(t) \approx e^{\eta\alpha^* t}\mathbf{c}^*</math>
where <math>\alpha^*</math> is the ''largest'' eigenvalue of <math>C</math>. At this time, the postsynaptic neuron performs the following operation:
:<math>y \approx e^{\eta\alpha^* t}\mathbf{c}^* \mathbf{x}</math>
Because, again, <math>\mathbf{c}^*</math> is the eigenvector corresponding to the largest eigenvalue of the correlation matrix between the <math>x_i</math>s, this corresponds exactly to computing the first [[principal component]] of the input.

This mechanism can be extended to performing a full PCA ([[principal component analysis]]) of the input by adding further postsynaptic neurons, provided the postsynaptic neurons are prevented from all picking up the same principal component, for example by adding [[lateral inhibition]] in the postsynaptic layer. We have thus connected Hebbian learning to PCA, which is an elementary form of unsupervised learning, in the sense that the network can pick up useful statistical aspects of the input, and "describe" them in a distilled way in its output.<ref name=":3" />