Editing Mathematical optimization (section)

== Classification of critical points and extrema ==

=== Feasibility problem ===
The ''[[satisfiability problem]]'', also called the ''feasibility problem'', is just the problem of finding any [[feasible solution]] at all without regard to objective value. This can be regarded as the special case of mathematical optimization where the objective value is the same for every solution, and thus any solution is optimal.

Many optimization algorithms need to start from a feasible point. One way to obtain such a point is to [[Relaxation (approximation)|relax]] the feasibility conditions using a [[slack variable]]; with enough slack, any starting point is feasible. Then, minimize that slack variable until the slack is null or negative.

=== Existence ===
The [[extreme value theorem]] of [[Karl Weierstrass]] states that a continuous real-valued function on a compact set attains its maximum and minimum value. More generally, a lower semi-continuous function on a compact set attains its minimum; an upper semi-continuous function on a compact set attains its maximum point or view.

=== Necessary conditions for optimality ===
[[Fermat's theorem (stationary points)|One of Fermat's theorems]] states that optima of unconstrained problems are found at [[stationary point]]s, where the first derivative or the gradient of the objective function is zero (see [[first derivative test]]). More generally, they may be found at [[Critical point (mathematics)|critical points]], where the first derivative or gradient of the objective function is zero or is undefined, or on the boundary of the choice set. An equation (or set of equations) stating that the first derivative(s) equal(s) zero at an interior optimum is called a 'first-order condition' or a set of first-order conditions.

Optima of equality-constrained problems can be found by the [[Lagrange multiplier]] method. The optima of problems with equality and/or inequality constraints can be found using the '[[Karush–Kuhn–Tucker conditions]]'.

=== Sufficient conditions for optimality ===
While the first derivative test identifies points that might be extrema, this test does not distinguish a point that is a minimum from one that is a maximum or one that is neither. When the objective function is twice differentiable, these cases can be distinguished by checking the second derivative or the matrix of second derivatives (called the [[Hessian matrix]]) in unconstrained problems, or the matrix of second derivatives of the objective function and the constraints called the [[Hessian matrix#Bordered Hessian|bordered Hessian]] in constrained problems. The conditions that distinguish maxima, or minima, from other stationary points are called 'second-order conditions' (see '[[Second derivative test]]'). If a candidate solution satisfies the first-order conditions, then the satisfaction of the second-order conditions as well is sufficient to establish at least local optimality.

=== Sensitivity and continuity of optima ===
The [[envelope theorem]] describes how the value of an optimal solution changes when an underlying [[parameter]] changes. The process of computing this change is called [[comparative statics]].

The [[maximum theorem]] of [[Claude Berge]] (1963) describes the continuity of an optimal solution as a function of underlying parameters.

===Calculus of optimization===
{{Main|Karush–Kuhn–Tucker conditions}}
{{See also|Critical point (mathematics)|Differential calculus|Gradient|Hessian matrix|Definite matrix|Lipschitz continuity|Rademacher's theorem|Convex function|Convex analysis}}

For unconstrained problems with twice-differentiable functions, some [[critical point (mathematics)|critical points]] can be found by finding the points where the [[gradient]] of the objective function is zero (that is, the stationary points). More generally, a zero [[subgradient]] certifies that a local minimum has been found for [[convex optimization|minimization problems with convex]] [[convex function|functions]] and other [[Rademacher's theorem|locally]] [[Lipschitz function]]s, which meet in loss function minimization of the neural network. The positive-negative momentum estimation lets to avoid the local minimum and converges at the objective function global minimum.<ref>{{Cite journal |last1=Abdulkadirov |first1=R. |last2=Lyakhov |first2=P. |last3=Bergerman |first3=M. |last4=Reznikov |first4=D. |date=February 2024 |title=Satellite image recognition using ensemble neural networks and difference gradient positive-negative momentum |url=https://linkinghub.elsevier.com/retrieve/pii/S0960077923013346 |journal=Chaos, Solitons & Fractals |language=en |volume=179 |pages=114432 |doi=10.1016/j.chaos.2023.114432|bibcode=2024CSF...17914432A }}</ref>

Further, critical points can be classified using the [[positive definite matrix|definiteness]] of the [[Hessian matrix]]: If the Hessian is ''positive'' definite at a critical point, then the point is a local minimum; if the Hessian matrix is negative definite, then the point is a local maximum; finally, if indefinite, then the point is some kind of [[saddle point]].

Constrained problems can often be transformed into unconstrained problems with the help of [[Lagrange multiplier]]s. [[Lagrangian relaxation]] can also provide approximate solutions to difficult constrained problems.

When the objective function is a [[convex function]], then any local minimum will also be a global minimum. There exist efficient numerical techniques for minimizing convex functions, such as [[interior-point method]]s.

===Global convergence===
More generally, if the objective function is not a quadratic function, then many optimization methods use other methods to ensure that some subsequence of iterations converges to an optimal solution. The first and still popular method for ensuring convergence relies on [[line search]]es, which optimize a function along one dimension. A second and increasingly popular method for ensuring convergence uses [[trust region]]s. Both line searches and trust regions are used in modern methods of [[subgradient method|non-differentiable optimization]]. Usually, a global optimizer is much slower than advanced local optimizers (such as [[BFGS method|BFGS]]), so often an efficient global optimizer can be constructed by starting the local optimizer from different starting points.