Editing Hessian matrix (section)

== Applications ==

=== Inflection points ===

If <math>f</math> is a [[homogeneous polynomial]] in three variables, the equation <math>f = 0</math> is the [[implicit equation]] of a [[plane projective curve]]. The [[inflection point]]s of the curve are exactly the non-singular points where the Hessian determinant is zero. It follows by [[Bézout's theorem]] that a [[cubic plane curve]] has at most 9 inflection points, since the Hessian determinant is a polynomial of degree 3.

=== Second-derivative test ===
{{Main|Second partial derivative test}}

The Hessian matrix of a [[convex function]] is [[Positive semi-definite matrix|positive semi-definite]]. Refining this property allows us to test whether a [[Critical point (mathematics)|critical point]] <math>x</math> is a local maximum, local minimum, or a saddle point, as follows:

If the Hessian is [[Positive-definite matrix|positive-definite]] at <math>x,</math> then <math>f</math> attains an isolated local minimum at <math>x.</math> If the Hessian is [[Positive-definite matrix#Negative-definite, semidefinite and indefinite matrices|negative-definite]] at <math>x,</math> then <math>f</math> attains an isolated local maximum at <math>x.</math> If the Hessian has both positive and negative [[eigenvalue]]s, then <math>x</math> is a [[saddle point]] for <math>f.</math> Otherwise the test is inconclusive. This implies that at a local minimum the Hessian is positive-semidefinite, and at a local maximum the Hessian is negative-semidefinite.

For positive-semidefinite and negative-semidefinite Hessians the test is inconclusive (a critical point where the Hessian is semidefinite but not definite may be a local extremum or a saddle point). However, more can be said from the point of view of [[Morse theory]].

The [[second-derivative test]] for functions of one and two variables is simpler than the general case. In one variable, the Hessian contains exactly one second derivative; if it is positive, then <math>x</math> is a local minimum, and if it is negative, then <math>x</math> is a local maximum; if it is zero, then the test is inconclusive. In two variables, the [[determinant]] can be used, because the determinant is the product of the eigenvalues. If it is positive, then the eigenvalues are both positive, or both negative. If it is negative, then the two eigenvalues have different signs. If it is zero, then the second-derivative test is inconclusive.

Equivalently, the second-order conditions that are sufficient for a local minimum or maximum can be expressed in terms of the sequence of principal (upper-leftmost) [[Minor (linear algebra)|minors]] (determinants of sub-matrices) of the Hessian; these conditions are a special case of those given in the next section for bordered Hessians for constrained optimization—the case in which the number of constraints is zero. Specifically, the sufficient condition for a minimum is that all of these principal minors be positive, while the sufficient condition for a maximum is that the minors alternate in sign, with the <math>1 \times 1</math> minor being negative.

=== Critical points ===

If the [[gradient]] (the vector of the partial derivatives) of a function <math>f</math> is zero at some point <math>\mathbf{x},</math> then <math>f</math> has a {{em|[[Critical point (mathematics)|critical point]]}} (or {{em|[[stationary point]]}}) at <math>\mathbf{x}.</math> The [[determinant]] of the Hessian at <math>\mathbf{x}</math> is called, in some contexts, a [[discriminant]]. If this determinant is zero then <math>\mathbf{x}</math> is called a {{em|degenerate critical point}} of <math>f,</math> or a {{em|non-Morse critical point}} of <math>f.</math> Otherwise it is non-degenerate, and called a {{em|Morse critical point}} of <math>f.</math>

The Hessian matrix plays an important role in [[Morse theory]] and [[catastrophe theory]], because its [[Kernel of a matrix|kernel]] and [[eigenvalue]]s allow classification of the critical points.<ref>{{Cite book|url=https://books.google.com/books?id=geruGMKT9_UC&pg=PA248|title=Advanced Calculus: A Geometric View|last=Callahan|first=James J.|date=2010|publisher=Springer Science & Business Media|isbn=978-1-4419-7332-0|page=248|language=en}}</ref><ref>{{Cite book|url=https://books.google.com/books?id=Tcn3CAAAQBAJ&pg=PA178|title=Recent Developments in General Relativity|editor-last=Casciaro|editor-first=B.|editor-last2=Fortunato|editor-first2=D.|editor-last3=Francaviglia|editor-first3=M.|editor-last4=Masiello|editor-first4=A.|date=2011|publisher=Springer Science & Business Media|isbn=9788847021136|page=178|language=en}}</ref><ref>{{cite book|author1=Domenico P. L. Castrigiano|author2=Sandra A. Hayes|title=Catastrophe theory|year=2004|publisher=Westview Press|isbn=978-0-8133-4126-2|page=18}}</ref>

The determinant of the Hessian matrix, when evaluated at a critical point of a function, is equal to the [[Gaussian curvature]] of the function considered as a manifold. The eigenvalues of the Hessian at that point are the principal curvatures of the function, and the eigenvectors are the principal directions of curvature. (See {{section link|Gaussian curvature|Relation to principal curvatures}}.)

=== Use in optimization ===

Hessian matrices are used in large-scale [[Mathematical optimization|optimization]] problems within [[Newton's method in optimization|Newton]]-type methods because they are the coefficient of the quadratic term of a local [[Taylor expansion]] of a function. That is,
<math display=block>y = f(\mathbf{x} + \Delta\mathbf{x})\approx f(\mathbf{x}) + \nabla f(\mathbf{x})^\mathsf{T} \Delta\mathbf{x} + \frac{1}{2} \, \Delta\mathbf{x}^\mathsf{T} \mathbf{H}(\mathbf{x}) \, \Delta\mathbf{x}</math>
where <math>\nabla f</math> is the [[gradient]] <math>\left(\frac{\partial f}{\partial x_1}, \ldots, \frac{\partial f}{\partial x_n}\right).</math> Computing and storing the full Hessian matrix takes [[Big theta|<math>\Theta\left(n^2\right)</math>]] memory, which is infeasible for high-dimensional functions such as the [[loss function]]s of [[Artificial neural network|neural nets]], [[conditional random field]]s, and other [[statistical model]]s with large numbers of parameters. For such situations, [[Truncated Newton method|truncated-Newton]] and [[Quasi-Newton method|quasi-Newton]] algorithms have been developed. The latter family of algorithms use approximations to the Hessian; one of the most popular quasi-Newton algorithms is [[Broyden–Fletcher–Goldfarb–Shanno algorithm|BFGS]].<ref>{{cite book|last1=Nocedal|first1=Jorge|author-link1=Jorge Nocedal|last2=Wright|first2=Stephen|year=2000|title=Numerical Optimization|isbn=978-0-387-98793-4|publisher=Springer Verlag}}</ref>

Such approximations may use the fact that an optimization algorithm uses the Hessian only as a [[linear operator]] <math>\mathbf{H}(\mathbf{v}),</math> and proceed by first noticing that the Hessian also appears in the local expansion of the gradient:
<math display=block>\nabla f (\mathbf{x} + \Delta\mathbf{x}) = \nabla f (\mathbf{x}) + \mathbf{H}(\mathbf{x}) \, \Delta\mathbf{x} + \mathcal{O}(\|\Delta\mathbf{x}\|^2)</math>

Letting <math>\Delta \mathbf{x} = r \mathbf{v}</math> for some scalar <math>r,</math> this gives
<math display=block>\mathbf{H}(\mathbf{x}) \, \Delta\mathbf{x} = \mathbf{H}(\mathbf{x})r\mathbf{v} = r\mathbf{H}(\mathbf{x})\mathbf{v} = \nabla f (\mathbf{x} + r\mathbf{v}) - \nabla f (\mathbf{x}) + \mathcal{O}(r^2),</math>
that is,
<math display=block>\mathbf{H}(\mathbf{x})\mathbf{v} = \frac{1}{r} \left[\nabla f(\mathbf{x} + r \mathbf{v}) - \nabla f(\mathbf{x})\right] + \mathcal{O}(r)</math>
so if the gradient is already computed, the approximate Hessian can be computed by a linear (in the size of the gradient) number of scalar operations. (While simple to program, this approximation scheme is not numerically stable since <math>r</math> has to be made small to prevent error due to the <math>\mathcal{O}(r)</math> term, but decreasing it loses precision in the first term.<ref>{{cite journal|last=Pearlmutter|first=Barak A.|title=Fast exact multiplication by the Hessian|journal=Neural Computation|volume=6|issue=1|year=1994|url=http://www.bcl.hamilton.ie/~barak/papers/nc-hessian.pdf|doi=10.1162/neco.1994.6.1.147|pages=147–160|s2cid=1251969 }}</ref>)

Notably regarding Randomized Search Heuristics, the [[evolution strategy]]'s covariance matrix adapts to the inverse of the Hessian matrix, [[up to]] a scalar factor and small random fluctuations.
This result has been formally proven for a single-parent strategy and a static model, as the population size increases, relying on the quadratic approximation.<ref>{{cite journal
| doi = 10.1016/j.tcs.2019.09.002
| first = O.M.
| last = Shir
| author2 = A. Yehudayoff
| title = On the covariance-Hessian relation in evolution strategies
| journal = Theoretical Computer Science
| volume = 801
| pages = 157–174
| publisher = Elsevier
| year = 2020
| doi-access = free
| arxiv = 1806.03674
}}</ref>

=== Other applications ===

The Hessian matrix is commonly used for expressing image processing operators in [[image processing]] and [[computer vision]] (see the [[Laplacian of Gaussian]] (LoG) blob detector, [[Blob detection#The determinant of the Hessian|the determinant of Hessian (DoH) blob detector]] and [[scale space]]). It can be used in [[normal mode]] analysis to calculate the different molecular frequencies in [[infrared spectroscopy]].<ref>{{Cite journal|last1=Mott|first1=Adam J.|last2=Rez|first2=Peter|date=December 24, 2014|title=Calculation of the infrared spectra of proteins|url=http://link.springer.com/10.1007/s00249-014-1005-6|journal=European Biophysics Journal|language=en|volume=44|issue=3|pages=103–112|doi=10.1007/s00249-014-1005-6|pmid=25538002 |s2cid=2945423 |issn=0175-7571}}</ref> It can also be used in local sensitivity and statistical diagnostics.<ref>{{cite journal|last1=Liu|first1=Shuangzhe |last2=Leiva|first2=Victor|last3=Zhuang|first3=Dan|last4=Ma|first4=Tiefeng
|last5=Figueroa-Zúñiga|first5=Jorge I.|date=March 2022|title=Matrix differential calculus with applications in the multivariate linear model and its diagnostics|journal=Journal of Multivariate Analysis |volume=188|pages=104849|doi=10.1016/j.jmva.2021.104849|doi-access=free}}</ref>