Editing Lagrange multiplier (section)

== Single constraint ==
[[Image:LagrangeMultipliers2D.svg|thumb|right|300px|Figure 1: The red curve shows the constraint <span style="color: DarkRed">{{math|1=''g''(''x'', ''y'') = ''c''}}</span>. The blue curves are contours of <span style="color:DarkSlateBlue">{{math|''f''(''x'', ''y'')}}</span>. The point where the red constraint tangentially touches a blue contour is the maximum of <span style="color:DarkSlateBlue">{{math|''f''(''x'', ''y'')}}</span> along the constraint, since {{nobr|{{math| ''d''{{sub|1}} > ''d''{{sub|2}} }}.}}]]

For the case of only one constraint and only two choice variables (as exemplified in Figure 1), consider the [[optimization problem]]
<math display="block">\begin{aligned}
\underset{x,y}{\text{maximize}} \quad& f(x,y) \\
\text{subject to}\quad& g(x,y) = 0.
\end{aligned}</math>
(Sometimes an additive constant is shown separately rather than being included in <math>g</math>, in which case the constraint is written <math> g(x,y) = c ,</math> as in Figure 1.) We assume that both <math> f </math> and <math> g </math> have continuous first [[partial derivative]]s.  We introduce a new variable (<math> \lambda </math>) called a '''Lagrange multiplier''' (or '''Lagrange undetermined multiplier''') and study the '''Lagrange function''' (or '''Lagrangian''' or '''Lagrangian expression''') defined by
<math display="block"> \mathcal{L}(x,y,\lambda) = f(x,y) + \lambda\cdot g(x,y),</math>
where the <math> \lambda </math> term may be either added or subtracted. If <math> f(x_0, y_0) </math> is a maximum of <math> f(x,y) </math> for the original constrained problem and <math> \nabla g(x_0,y_0) \ne 0 ,</math> then there exists <math> \lambda_0 </math> such that (<math> x_0, y_0, \lambda_0 </math>) is a ''[[stationary point]]'' for the Lagrange function (stationary points are those points where the first partial derivatives of <math> \mathcal{L} </math> are zero). The assumption <math> \nabla g \ne 0 </math> is called constraint qualification. However, not all stationary points yield a solution of the original problem, as the method of Lagrange multipliers yields only a [[necessary condition]] for optimality in constrained problems.<ref>{{cite book | last = Bertsekas | first = Dimitri P. | author-link = Dimitri P. Bertsekas | year = 1999 | title = Nonlinear Programming | edition = Second  | publisher = Athena Scientific | location = Cambridge, MA | isbn = 1-886529-00-0 }}</ref><ref>{{springer |id=Lagrange_multipliers |title=Lagrange multipliers |first=I.B. |last=Vapnyarskii }}.</ref><ref>{{cite book |last=Lasdon |first=Leon&nbsp;S. |year=2002 |orig-date=1970 |title=Optimization Theory for Large Systems |publisher=Dover |location=Mineola, New York, NY |edition=reprint |mr=1888251 |isbn=0-486-41999-1 }}</ref><ref>{{cite book  |last1=Hiriart-Urruty |first1=Jean-Baptiste |last2=Lemaréchal |first2=Claude |author2-link=Claude Lemaréchal |year=1993 |chapter=Chapter&nbsp;XII: Abstract duality for practitioners |title=Convex analysis and minimization algorithms |id=Volume&nbsp;II: Advanced theory and bundle methods |series=Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences] |volume=306 |publisher=Springer-Verlag |place=Berlin, DE |isbn=3-540-56852-2 |mr=1295240 |pages=136–193 (and Bibliographical comments pp.&nbsp;334–335)}}</ref><ref>{{cite conference |last=Lemaréchal |first=Claude |author-link=Claude Lemaréchal |date=15–19 May 2000 |title=Lagrangian relaxation |editor1-last=Jünger |editor1-first=Michael |editor2-last=Naddef |editor2-first=Denis |publication-date=2001 |book-title=Computational combinatorial optimization: Papers from the Spring School held in Schloß Dagstuhl |conference=Spring School held in Schloß Dagstuhl, {{nobr|May 15–19, 2000}} |publisher=Springer-Verlag |isbn=3-540-42877-1 |series=Lecture Notes in Computer Science |volume=2241 |place=Berlin, DE |pages=112–156 |doi=10.1007/3-540-45586-8_4 |mr=1900016 |s2cid=9048698}}</ref> Sufficient conditions for a minimum or maximum [[Bordered Hessian|also exist]], but if a particular [[candidate solution]] satisfies the sufficient conditions, it is only guaranteed that that solution is the best one ''locally'' – that is, it is better than any permissible nearby points. The ''global'' optimum can be found by comparing the values of the original objective function at the points satisfying the necessary and locally sufficient conditions.

The method of Lagrange multipliers relies on the intuition that at a maximum, {{math|''f''(''x'', ''y'')}} cannot be increasing in the direction of any such neighboring point that also has {{math|1= ''g'' = 0}}. If it were, we could walk along {{math|1= ''g'' = 0}} to get higher, meaning that the starting point wasn't actually the maximum. Viewed in this way, it is an exact analogue to testing if the derivative of an unconstrained function is {{math|0}}, that is, we are verifying that the directional derivative is 0 in any relevant (viable) direction.

We can visualize [[Contour line|contour]]s of {{mvar|f}} given by {{math|1= ''f''(''x'', ''y'') = ''d''}} for various values of {{mvar|d}}, and the contour of {{mvar|g}} given by {{math|1= ''g''(''x'', ''y'') = ''c''}}.

Suppose we walk along the contour line with {{nobr|{{math|1= ''g'' = ''c''}} .}} We are interested in finding points where {{mvar|f}} almost does not change as we walk, since these points might be maxima.

There are two ways this could happen:

# We could touch a contour line of {{mvar|f}}, since by definition {{mvar|f}} does not change as we walk along its contour lines. This would mean that the tangents to the contour lines of {{mvar|f}} and {{mvar|g}} are parallel here.
# We have reached a "level" part of {{mvar|f}}, meaning that {{mvar|f}} does not change in any direction.

To check the first possibility (we touch a contour line of {{mvar|f}}), notice that since the [[gradient]] of a function is perpendicular to the contour lines, the tangents to the contour lines of {{mvar|f}} and {{mvar|g}} are parallel if and only if the gradients of {{mvar|f}} and {{mvar|g}} are parallel. Thus we want points {{math|(''x'', ''y'')}} where {{nobr| {{math| ''g''(''x'', ''y'') {{=}} ''c''}} }} and
<math display="block">\nabla_{x,y} f = \lambda \, \nabla_{x,y} g,</math>
for some <math> \lambda </math> where
<math display="block"> \nabla_{x,y} f = \left( \frac{\partial f}{\partial x}, \frac{\partial f}{\partial y} \right), \qquad \nabla_{x,y} g = \left( \frac{\partial g}{\partial x}, \frac{\partial g}{\partial y} \right)</math>
are the respective gradients. The constant <math> \lambda </math> is required because although the two gradient vectors are parallel, the magnitudes of the gradient vectors are generally not equal. This constant is called the Lagrange multiplier. (In some conventions <math> \lambda </math> is preceded by a minus sign).

Notice that this method also solves the second possibility, that {{mvar|f}} is level: if {{mvar|f}} is level, then its gradient is zero, and setting <math> \lambda = 0 </math> is a solution regardless of <math>\nabla_{x,y}g</math>.

To incorporate these conditions into one equation, we introduce an auxiliary function
<math display="block"> \mathcal{L}(x,y,\lambda) \equiv f(x,y) + \lambda\cdot g(x,y)\, ,</math>
and solve
<math display="block"> \nabla_{x,y,\lambda} \mathcal{L}(x, y, \lambda) = 0 ~.</math>Note that this amounts to solving three equations in three unknowns. This is the method of Lagrange multipliers.

Note that <math>\ \nabla_{\lambda} \mathcal{L}(x, y, \lambda) = 0\ </math> implies <math>\ g(x,y) = 0 \ ,</math> as the partial derivative of <math>\mathcal{L}</math> with respect to <math>\lambda</math> is <math>\ g(x,y) ~.</math>

To summarize
<math display="block">
\nabla_{x,y,\lambda} \mathcal{L}(x, y, \lambda) = 0 \iff 
\begin{cases}
\nabla_{x,y} f(x , y) = -\lambda \, \nabla_{x,y} g(x , y) \\
g(x,y) = 0
\end{cases}</math>The method generalizes readily to functions on <math>n</math> variables
<math display="block"> \nabla_{x_1, \dots, x_n,\lambda} \mathcal{L}(x_1, \dots, x_n, \lambda) = 0</math>
which amounts to solving {{math|''n'' + 1}} equations in {{math|''n'' + 1}} unknowns.

The constrained extrema of {{mvar|f}} are ''[[Critical point (mathematics)|critical points]]'' of the Lagrangian <math> \mathcal{L}</math>, but they are not necessarily ''local extrema'' of <math> \mathcal{L} </math> (see {{slink||Example 2}} below).

One may [[Hamiltonian mechanics#As a reformulation of Lagrangian mechanics|reformulate the Lagrangian]] as a [[Hamiltonian (control theory)|Hamiltonian]], in which case the solutions are local minima for the Hamiltonian. This is done in [[optimal control]] theory, in the form of [[Pontryagin's maximum principle]].

The fact that solutions of the method of Lagrange multipliers are not necessarily extrema of the Lagrangian, also poses difficulties for numerical optimization. This can be addressed by minimizing the ''magnitude'' of the gradient of the Lagrangian, as these minima are the same as the zeros of the magnitude, as illustrated in [[#Example 5|Example 5: Numerical optimization]].