Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Levenberg–Marquardt algorithm
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
=== Choice of damping parameter === Various more or less heuristic arguments have been put forward for the best choice for the damping parameter {{tmath|\lambda}}. Theoretical arguments exist showing why some of these choices guarantee local convergence of the algorithm; however, these choices can make the global convergence of the algorithm suffer from the undesirable properties of [[gradient descent|steepest descent]], in particular, very slow convergence close to the optimum. The absolute values of any choice depend on how well-scaled the initial problem is. Marquardt recommended starting with a value {{tmath|\lambda_0}} and a factor {{tmath|\nu > 1}}. Initially setting <math>\lambda = \lambda_0</math> and computing the residual sum of squares <math>S\left (\boldsymbol\beta\right )</math> after one step from the starting point with the damping factor of <math>\lambda = \lambda_0</math> and secondly with {{tmath|\lambda_0 / \nu}}. If both of these are worse than the initial point, then the damping is increased by successive multiplication by {{tmath|\nu}} until a better point is found with a new damping factor of {{tmath|\lambda_0\nu^k}} for some {{tmath|k}}. If use of the damping factor {{tmath|\lambda / \nu}} results in a reduction in squared residual, then this is taken as the new value of {{tmath|\lambda}} (and the new optimum location is taken as that obtained with this damping factor) and the process continues; if using {{tmath|\lambda / \nu}} resulted in a worse residual, but using {{tmath|\lambda}} resulted in a better residual, then {{tmath|\lambda}} is left unchanged and the new optimum is taken as the value obtained with {{tmath|\lambda}} as damping factor. An effective strategy for the control of the damping parameter, called ''delayed gratification'', consists of increasing the parameter by a small amount for each uphill step, and decreasing by a large amount for each downhill step. The idea behind this strategy is to avoid moving downhill too fast in the beginning of optimization, therefore restricting the steps available in future iterations and therefore slowing down convergence.<ref name="Transtrum2011"/> An increase by a factor of 2 and a decrease by a factor of 3 has been shown to be effective in most cases, while for large problems more extreme values can work better, with an increase by a factor of 1.5 and a decrease by a factor of 5.<ref name="Transtrum2012"/>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)