Editing Gradient descent (section)

==Solution of a non-linear system==

Gradient descent can also be used to solve a system of [[nonlinear equation]]s. Below is an example that shows how to use the gradient descent to solve for three unknown variables, ''x''<sub>1</sub>, ''x''<sub>2</sub>, and ''x''<sub>3</sub>. This example shows one iteration of the gradient descent.

Consider the nonlinear system of equations

:<math> \begin{cases}
3x_1-\cos(x_2x_3)-\tfrac{3}{2} =0 \\
4x_1^2-625x_2^2+2x_2-1 = 0  \\
\exp(-x_1x_2)+20x_3+\tfrac{10\pi-3}{3} =0
\end{cases}</math>

Let us introduce the associated function

:<math>G(\mathbf{x}) = \begin{bmatrix}
3x_1-\cos(x_2x_3)-\tfrac{3}{2} \\
4x_1^2-625x_2^2+2x_2-1 \\
\exp(-x_1x_2)+20x_3+\tfrac{10\pi-3}{3} \\
\end{bmatrix}, </math>

where

:<math> \mathbf{x} =\begin{bmatrix}
  x_1 \\
  x_2 \\
  x_3 \\
\end{bmatrix}.</math>

One might now define the objective function

:<math>\begin{align}F(\mathbf{x}) &= \frac{1}{2} G^\mathrm{T}(\mathbf{x}) G(\mathbf{x}) \\&=\frac{1}{2} \left[ \left (3x_1-\cos(x_2x_3)-\frac{3}{2} \right)^2  + \left(4x_1^2-625x_2^2+2x_2-1 \right)^2 +\right.\\
&{}\qquad\left. \left(\exp(-x_1x_2) + 20x_3 + \frac{10\pi-3}{3} \right)^2 \right],\end{align}</math>

which we will attempt to minimize. As an initial guess, let us use

:<math> \mathbf{x}^{(0)}= \mathbf{0} = \begin{bmatrix}
  0 \\
  0 \\
  0 \\ \end{bmatrix}.</math>

We know that

:<math>\mathbf{x}^{(1)}=\mathbf{0}-\gamma_0 \nabla F(\mathbf{0}) = \mathbf{0}-\gamma_0 J_G(\mathbf{0})^\mathrm{T} G(\mathbf{0}),</math>

where the [[Jacobian matrix]] <math>J_G</math> is given by

:<math>J_G(\mathbf{x}) = \begin{bmatrix}
  3 & \sin(x_2x_3)x_3 & \sin(x_2x_3)x_2   \\
  8x_1 & -1250x_2+2 & 0 \\
  -x_2\exp{(-x_1x_2)} & -x_1\exp(-x_1x_2) & 20\\
\end{bmatrix}.</math>

We calculate:

:<math>J_G(\mathbf{0}) = \begin{bmatrix}
  3 & 0 & 0\\
  0 & 2 & 0\\
  0 & 0 & 20
\end{bmatrix}, \qquad G(\mathbf{0}) = \begin{bmatrix}
  -2.5\\
  -1\\
  10.472
\end{bmatrix}.</math>

Thus

:<math>\mathbf{x}^{(1)}= \mathbf{0}-\gamma_0 \begin{bmatrix}
  -7.5\\
  -2\\
  209.44
\end{bmatrix},</math>

and

:<math>F(\mathbf{0}) = 0.5 \left( (-2.5)^2 + (-1)^2 + (10.472)^2 \right) = 58.456.</math>

[[File:Gradient Descent Example Nonlinear Equations.gif|thumb|right|350px|An animation showing the first 83 iterations of gradient descent applied to this example. Surfaces are [[isosurface]]s of <math>F(\mathbf{x}^{(n)})</math> at current guess <math>\mathbf{x}^{(n)}</math>, and arrows show the direction of descent. Due to a small and constant step size, the convergence is slow.]]

Now, a suitable <math>\gamma_0</math> must be found such that

:<math>F\left (\mathbf{x}^{(1)}\right ) \le F\left (\mathbf{x}^{(0)}\right ) = F(\mathbf{0}).</math>

This can be done with any of a variety of [[line search]] algorithms. One might also simply guess <math>\gamma_0=0.001,</math> which gives

:<math> \mathbf{x}^{(1)}=\begin{bmatrix}
   0.0075  \\
   0.002   \\
  -0.20944 \\
\end{bmatrix}.</math>

Evaluating the objective function at this value, yields

:<math>F \left (\mathbf{x}^{(1)}\right ) = 0.5 \left ((-2.48)^2 + (-1.00)^2 + (6.28)^2 \right ) = 23.306.</math>

The decrease from <math>F(\mathbf{0})=58.456</math> to the next step's value of

:<math> F\left (\mathbf{x}^{(1)}\right ) =23.306 </math>

is a sizable decrease in the objective function. Further steps would reduce its value further until an approximate solution to the system was found.