Editing Line search (section)

== One-dimensional line search ==
Suppose ''f'' is a one-dimensional function, <math>f:\mathbb R\to\mathbb R</math>, and assume that it is [[unimodal]], that is, contains exactly one local minimum ''x''* in a given interval [''a'',''z'']. This means that ''f'' is strictly decreasing in [a,x*] and strictly increasing in [x*,''z'']. There are several ways to find an (approximate) minimum point in this case.<ref name=":0">{{Cite web |last=Nemirovsky and Ben-Tal |date=2023 |title=Optimization III: Convex Optimization |url=http://www2.isye.gatech.edu/~nemirovs/OPTIIILN2023Spring.pdf}}</ref>{{Rp|location=sec.5}}

=== Zero-order methods ===
Zero-order methods use only function evaluations (i.e., a [[value oracle]]) - not derivatives:<ref name=":0" />{{Rp|location=sec.5}}
* [[Ternary search]]: pick some two points ''b,c'' such that ''a''<''b''<''c''<''z''. If f(''b'')≤f(''c''), then x* must be in [''a'',''c'']; if f(''b'')≥f(''c''), then x* must be in [''b'',''z'']. In both cases, we can replace the search interval with a smaller one. If we pick ''b'',''c'' very close to the interval center, then the interval shrinks by ~1/2 at each iteration, but we need two function evaluations per iteration. Therefore, the method has [[linear convergence]] with rate <math>\sqrt{0.5}\approx 0.71</math>. If we pick b,c such that the partition a,b,c,z has three equal-length intervals, then the interval shrinks by 2/3 at each iteration, so the method has [[linear convergence]] with rate <math>\sqrt{2/3}\approx 0.82</math>.
* Fibonacci search: This is a variant of ternary search in which the points ''b'',''c'' are selected based on the [[Fibonacci sequence]]. At each iteration, only one function evaluation is needed, since the other point was already an endpoint of a previous interval. Therefore, the method has linear convergence with rate <math>1/ \varphi  \approx 0.618</math> .
* [[Golden-section search]]: This is a variant in which the points ''b'',''c'' are selected based on the [[golden ratio]]. Again, only one function evaluation is needed in each iteration, and the method has linear convergence with rate <math>1/ \varphi  \approx 0.618</math> . This ratio is optimal among the zero-order methods.
Zero-order methods are very general - they do not assume differentiability or even continuity.

=== First-order methods ===
First-order methods assume that ''f'' is continuously differentiable, and that we can evaluate not only ''f'' but also its derivative.<ref name=":0" />{{Rp|location=sec.5}}

* The [[bisection method]] computes the derivative of ''f'' at the center of the interval, ''c'': if f'(c)=0, then this is the minimum point; if f'(''c'')>0, then the minimum must be in [''a'',''c'']; if f'(''c'')<0, then the minimum must be in [''c'',''z'']. This method has linear convergence with rate 0.5.

=== Curve-fitting methods ===
Curve-fitting methods try to attain [[superlinear convergence]] by assuming that ''f'' has some analytic form, e.g. a polynomial of finite degree. At each iteration, there is a set of "working points" in which we know the value of ''f'' (and possibly also its derivative). Based on these points, we can compute a polynomial that fits the known values, and find its minimum analytically. The minimum point becomes a new working point, and we proceed to the next iteration:<ref name=":0" />{{Rp|location=sec.5}}

* [[Newton's method in optimization|Newton's method]] is a special case of a curve-fitting method, in which the curve is a degree-two polynomial, constructed using the first and second derivatives of ''f''. If the method is started close enough to a non-degenerate local minimum (= with a positive second derivative), then it has [[quadratic convergence]].
* [[Regula falsi]] is another method that fits the function to a degree-two polynomial, but it uses the first derivative at two points, rather than the first and second derivative at the same point. If the method is started close enough to a non-degenerate local minimum, then it has superlinear convergence of order <math>\varphi  \approx 1.618</math>.
* ''Cubic fit'' fits to a degree-three polynomial, using both the function values and its derivative at the last two points. If the method is started close enough to a non-degenerate local minimum, then it has [[quadratic convergence]].

Curve-fitting methods have superlinear convergence when started close enough to the local minimum, but might diverge otherwise. ''Safeguarded curve-fitting methods'' simultaneously execute a linear-convergence method in parallel to the curve-fitting method. They check in each iteration whether the point found by the curve-fitting method is close enough to the interval maintained by safeguard method; if it is not, then the safeguard method is used to compute the next iterate.<ref name=":0" />{{Rp|location=5.2.3.4}}