Editing Maximum likelihood estimation (section)

=== [[Quasi-Newton method]]s ===
Other quasi-Newton methods use more elaborate secant updates to give approximation of Hessian matrix.

==== [[Davidon–Fletcher–Powell formula]] ====
DFP formula finds a solution that is symmetric, positive-definite and closest to the current approximate value of second-order derivative:
<math display="block">\mathbf{H}_{k+1} =
  \left(I - \gamma_k y_k s_k^\mathsf{T}\right) \mathbf{H}_k \left(I - \gamma_k s_k y_k^\mathsf{T}\right) + \gamma_k y_k y_k^\mathsf{T},
</math>

where

<math display="block">y_k = \nabla\ell(x_k + s_k) - \nabla\ell(x_k),</math>
<math display="block">\gamma_k = \frac{1}{y_k^T s_k},</math>
<math display="block">s_k = x_{k+1} - x_k.</math>

==== [[Broyden–Fletcher–Goldfarb–Shanno algorithm]] ====
BFGS also gives a solution that is symmetric and positive-definite:

<math display="block">B_{k+1} = B_k + \frac{y_k y_k^\mathsf{T}}{y_k^\mathsf{T} s_k} - \frac{B_k s_k s_k^\mathsf{T} B_k^\mathsf{T}}{s_k^\mathsf{T} B_k s_k}\ ,</math>

where

<math display="block">y_k = \nabla\ell(x_k + s_k) - \nabla\ell(x_k),</math>
<math display="block">s_k = x_{k+1} - x_k.</math>

BFGS method is not guaranteed to converge unless the function has a quadratic [[Taylor expansion]] near an optimum. However, BFGS can have acceptable performance even for non-smooth optimization instances

==== [[Scoring algorithm|Fisher's scoring]] ====
Another popular method is to replace the Hessian with the [[Fisher information matrix]], <math>\mathcal{I}(\theta) = \operatorname{\mathbb E}\left[\mathbf{H}_r \left(\widehat{\theta}\right)\right]</math>, giving us the Fisher scoring algorithm. This procedure is standard in the estimation of many methods, such as [[generalized linear models]].

Although popular, quasi-Newton methods may converge to a [[stationary point]] that is not necessarily a local or global maximum,<ref>See theorem 10.1 in
{{cite book
 |first=Mordecai |last=Avriel
 |year=1976
 |title=Nonlinear Programming: Analysis and Methods
 |pages=293–294
 |location=Englewood Cliffs, NJ
 |publisher=Prentice-Hall
 |isbn=978-0-486-43227-4
 |url=https://books.google.com/books?id=byF4Xb1QbvMC&pg=PA293
}}
</ref> but rather a local minimum or a [[saddle point]]. Therefore, it is important to assess the validity of the obtained solution to the likelihood equations, by verifying that the Hessian, evaluated at the solution, is both [[negative definite]] and [[well-conditioned]].<ref>
{{cite book
 |first1=Philip E. |last1=Gill
 |first2=Walter |last2=Murray
 |first3=Margaret H. |last3=Wright |author-link3=Margaret H. Wright
 |year=1981
 |title=Practical Optimization
 |location=London, UK
 |publisher=Academic Press
 |pages=[https://archive.org/details/practicaloptimiz00gill/page/n329 312]–313
 |isbn=0-12-283950-1
 |url=https://archive.org/details/practicaloptimiz00gill
 |url-access=limited
}}
</ref>