Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Newton's method in optimization
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Computing the Newton direction== Finding the inverse of the Hessian in high dimensions to compute the Newton direction <math>h = - (f''(x_k))^{-1} f'(x_k)</math> can be an expensive operation. In such cases, instead of directly inverting the Hessian, it is better to calculate the vector <math> h </math> as the solution to the [[system of linear equations]] :<math>[f''(x_k)] h = - f'(x_k)</math> which may be solved by various factorizations or approximately (but to great accuracy) using [[iterative methods]]. Many of these methods are only applicable to certain types of equations, for example the [[Cholesky factorization]] and [[Conjugate gradient method|conjugate gradient]] will only work if <math>f''(x_k)</math> is a positive definite matrix. While this may seem like a limitation, it is often a useful indicator of something gone wrong; for example if a minimization problem is being approached and <math>f''(x_k)</math> is not positive definite, then the iterations are converging to a [[saddle point]] and not a minimum. On the other hand, if a [[constrained optimization]] is done (for example, with [[Lagrange multipliers]]), the problem may become one of saddle point finding, in which case the Hessian will be symmetric indefinite and the solution of <math>x_{k+1}</math> will need to be done with a method that will work for such, such as the <math>LDL^\top</math> variant of [[Cholesky factorization]] or the [[conjugate residual method]]. There also exist various [[quasi-Newton method]]s, where an approximation for the Hessian (or its inverse directly) is built up from changes in the gradient. If the Hessian is close to a non-[[invertible matrix]], the inverted Hessian can be numerically unstable and the solution may diverge. In this case, certain workarounds have been tried in the past, which have varied success with certain problems. One can, for example, modify the Hessian by adding a correction matrix <math>B_k</math> so as to make <math> f''(x_k) + B_k</math> positive definite. One approach is to diagonalize the Hessian and choose <math>B_k</math> so that <math> f''(x_k) + B_k</math> has the same eigenvectors as the Hessian, but with each negative eigenvalue replaced by <math>\epsilon>0</math>. An approach exploited in the [[Levenberg–Marquardt algorithm]] (which uses an approximate Hessian) is to add a scaled identity matrix to the Hessian, <math>\mu I</math>, with the scale adjusted at every iteration as needed. For large <math>\mu </math> and small Hessian, the iterations will behave like [[gradient descent]] with step size <math>1/\mu</math>. This results in slower but more reliable convergence where the Hessian doesn't provide useful information.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)