Editing Hopfield network (section)

=== Continuous variables ===
Modern Hopfield networks or dense associative memories can be best understood in continuous variables and continuous time.<ref name=":3" /><ref name=":4" /> Consider the network architecture, shown in Fig.1, and the equations for neuron's states evolution<ref name=":4" />{{NumBlk2|:|<math display="block" id="dynamical equations">\begin{cases}
\tau_f \frac{d x_i}{dt} = \sum\limits_{\mu=1}^{N_h} \xi_{i \mu} f_\mu - x_i + I_i\\
\tau_h \frac{d h_\mu}{dt} = \sum\limits_{i=1}^{N_f} \xi_{\mu i} g_i - h_\mu
\end{cases}</math>|1}}where the currents of the feature neurons are denoted by <math display="inline">x_i</math>, and the currents of the memory neurons are denoted by <math>h_\mu</math> (<math>h</math> stands for hidden neurons). There are no synaptic connections among the feature neurons or the memory neurons. A matrix <math>\xi_{\mu i}</math> denotes the strength of synapses from a feature neuron <math>i</math> to the memory neuron <math>\mu</math>. The synapses are assumed to be symmetric, so that the same value characterizes a different physical synapse from the memory neuron <math>\mu</math> to the feature neuron <math>i</math>. The outputs of the memory neurons and the feature neurons are denoted by <math>f_\mu</math> and <math>g_i</math>, which are non-linear functions of the corresponding currents. In general these outputs can depend on the currents of all the neurons in that layer so that <math>f_\mu = f(\{h_\mu\})</math> and <math display="inline">g_i = g(\{x_i\})</math>. It is convenient to define these activation functions as derivatives of the Lagrangian functions for the two groups of neurons{{NumBlk2|:|<math display="block" id="Lagrangian_def">f_\mu = \frac{\partial L_h}{\partial h_\mu},\ \ \ \  \text{and}\ \ \ \ g_i = \frac{\partial L_x}{\partial x_i}</math>|2}}This way the specific form of the equations for neuron's states is completely defined once the Lagrangian functions are specified. Finally, the time constants for the two groups of neurons are denoted by <math>\tau_f</math> and <math>\tau_h</math>, <math>I_i</math> is the input current to the network that can be driven by the presented data.  
[[File:Effective theory of Modern Hopfield Networks.png|thumb|827x827px|Fig. 2: Effective theory on the feature neurons for various common choices of the Lagrangian functions. Model A reduces to the models studied in<ref name=":1" /><ref name=":2" /> depending on the choice of the activation function, model B reduces to the model studied in,<ref name=":3" /> model C reduces to the model of.<ref name=":4" /> F is a "[[Smoothness|smooth]] enough" function.<ref name=":1" />]]
General systems of non-linear differential equations can have many complicated behaviors that can depend on the choice of the non-linearities and the initial conditions. For Hopfield Networks, however, this is not the case - the dynamical trajectories always converge to a fixed point attractor state. This property is achieved because these equations are specifically engineered so that they have an underlying energy function<ref name=":4" /> {{NumBlk2|:|<math display="block" id="energy">E(t) = \Big[\sum\limits_{i=1}^{N_f} (x_i-I_i) g_i - L_x \Big] + \Big[\sum\limits_{\mu=1}^{N_h}  h_\mu f_\mu - L_h \Big] - \sum\limits_{\mu, i} f_\mu \xi_{\mu i} g_i</math>|3}}The terms grouped into square brackets represent a Legendre transform of the Lagrangian function with respect to the states of the neurons. If the Hessian matrices of the Lagrangian functions are positive semi-definite, the energy function is guaranteed to decrease on the dynamical trajectory<ref name=":4" /> {{NumBlk2|:|<math display="block" id="energy_decrease">\frac{dE(t)}{dt}= - \tau_f \sum\limits_{i,j=1}^{N_f} \frac{d x_i}{dt} \frac{\partial^2 L_x}{\partial x_i \partial x_j} \frac{d x_j}{dt} - \tau_h \sum\limits_{\mu,\nu = 1}^{N_h}  \frac{d h_\mu}{dt} \frac{\partial^2 L_h}{\partial h_\mu \partial h_\nu} \frac{d h_\nu}{dt}  \leq 0</math>|4}} This property makes it possible to prove that the system of dynamical equations describing temporal evolution of neurons' activities will eventually reach a fixed point attractor state.

In certain situations one can assume that the dynamics of hidden neurons equilibrates at a much faster time scale compared to the feature neurons, <math display="inline">\tau_h\ll\tau_f</math>. In this case the steady state solution of the second equation in the system ({{EquationNote|1}}) can be used to express the currents of the hidden units through the outputs of the feature neurons. This makes it possible to reduce the general theory ({{EquationNote|1}}) to an effective theory for feature neurons only. The resulting effective update rules and the energies for various common choices of the Lagrangian functions are shown in Fig.2. In the case of log-sum-exponential Lagrangian function the update rule (if applied once) for the states of the feature neurons is the attention mechanism<ref name=":3" /> commonly used in many modern AI systems (see Ref.<ref name=":4" /> for the derivation of this result from the continuous time formulation).