Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Hessian matrix
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
=== Use in optimization === Hessian matrices are used in large-scale [[Mathematical optimization|optimization]] problems within [[Newton's method in optimization|Newton]]-type methods because they are the coefficient of the quadratic term of a local [[Taylor expansion]] of a function. That is, <math display=block>y = f(\mathbf{x} + \Delta\mathbf{x})\approx f(\mathbf{x}) + \nabla f(\mathbf{x})^\mathsf{T} \Delta\mathbf{x} + \frac{1}{2} \, \Delta\mathbf{x}^\mathsf{T} \mathbf{H}(\mathbf{x}) \, \Delta\mathbf{x}</math> where <math>\nabla f</math> is the [[gradient]] <math>\left(\frac{\partial f}{\partial x_1}, \ldots, \frac{\partial f}{\partial x_n}\right).</math> Computing and storing the full Hessian matrix takes [[Big theta|<math>\Theta\left(n^2\right)</math>]] memory, which is infeasible for high-dimensional functions such as the [[loss function]]s of [[Artificial neural network|neural nets]], [[conditional random field]]s, and other [[statistical model]]s with large numbers of parameters. For such situations, [[Truncated Newton method|truncated-Newton]] and [[Quasi-Newton method|quasi-Newton]] algorithms have been developed. The latter family of algorithms use approximations to the Hessian; one of the most popular quasi-Newton algorithms is [[Broyden–Fletcher–Goldfarb–Shanno algorithm|BFGS]].<ref>{{cite book|last1=Nocedal|first1=Jorge|author-link1=Jorge Nocedal|last2=Wright|first2=Stephen|year=2000|title=Numerical Optimization|isbn=978-0-387-98793-4|publisher=Springer Verlag}}</ref> Such approximations may use the fact that an optimization algorithm uses the Hessian only as a [[linear operator]] <math>\mathbf{H}(\mathbf{v}),</math> and proceed by first noticing that the Hessian also appears in the local expansion of the gradient: <math display=block>\nabla f (\mathbf{x} + \Delta\mathbf{x}) = \nabla f (\mathbf{x}) + \mathbf{H}(\mathbf{x}) \, \Delta\mathbf{x} + \mathcal{O}(\|\Delta\mathbf{x}\|^2)</math> Letting <math>\Delta \mathbf{x} = r \mathbf{v}</math> for some scalar <math>r,</math> this gives <math display=block>\mathbf{H}(\mathbf{x}) \, \Delta\mathbf{x} = \mathbf{H}(\mathbf{x})r\mathbf{v} = r\mathbf{H}(\mathbf{x})\mathbf{v} = \nabla f (\mathbf{x} + r\mathbf{v}) - \nabla f (\mathbf{x}) + \mathcal{O}(r^2),</math> that is, <math display=block>\mathbf{H}(\mathbf{x})\mathbf{v} = \frac{1}{r} \left[\nabla f(\mathbf{x} + r \mathbf{v}) - \nabla f(\mathbf{x})\right] + \mathcal{O}(r)</math> so if the gradient is already computed, the approximate Hessian can be computed by a linear (in the size of the gradient) number of scalar operations. (While simple to program, this approximation scheme is not numerically stable since <math>r</math> has to be made small to prevent error due to the <math>\mathcal{O}(r)</math> term, but decreasing it loses precision in the first term.<ref>{{cite journal|last=Pearlmutter|first=Barak A.|title=Fast exact multiplication by the Hessian|journal=Neural Computation|volume=6|issue=1|year=1994|url=http://www.bcl.hamilton.ie/~barak/papers/nc-hessian.pdf|doi=10.1162/neco.1994.6.1.147|pages=147–160|s2cid=1251969 }}</ref>) Notably regarding Randomized Search Heuristics, the [[evolution strategy]]'s covariance matrix adapts to the inverse of the Hessian matrix, [[up to]] a scalar factor and small random fluctuations. This result has been formally proven for a single-parent strategy and a static model, as the population size increases, relying on the quadratic approximation.<ref>{{cite journal | doi = 10.1016/j.tcs.2019.09.002 | first = O.M. | last = Shir | author2 = A. Yehudayoff | title = On the covariance-Hessian relation in evolution strategies | journal = Theoretical Computer Science | volume = 801 | pages = 157–174 | publisher = Elsevier | year = 2020 | doi-access = free | arxiv = 1806.03674 }}</ref>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)