Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Conjugate gradient method
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
===The resulting algorithm=== The above algorithm gives the most straightforward explanation of the conjugate gradient method. Seemingly, the algorithm as stated requires storage of all previous searching directions and residue vectors, as well as many matrix–vector multiplications, and thus can be computationally expensive. However, a closer analysis of the algorithm shows that <math>\mathbf{r}_i</math> is orthogonal to <math>\mathbf{r}_j</math>, i.e. <math>\mathbf{r}_i^\mathsf{T} \mathbf{r}_j=0 </math>, for <math>i \neq j</math>. And <math>\mathbf{p}_i</math> is <math>\mathbf{A}</math>-orthogonal to <math>\mathbf{p}_j</math>, i.e. <math>\mathbf{p}_i^\mathsf{T} \mathbf{A} \mathbf{p}_j=0 </math>, for <math>i \neq j</math>. This can be regarded that as the algorithm progresses, <math>\mathbf{p}_i</math> and <math>\mathbf{r}_i</math> span the same [[Krylov subspace]], where <math>\mathbf{r}_i</math> form the orthogonal basis with respect to the standard inner product, and <math>\mathbf{p}_i</math> form the orthogonal basis with respect to the inner product induced by <math>\mathbf{A}</math>. Therefore, <math>\mathbf{x}_k</math> can be regarded as the projection of <math>\mathbf{x}</math> on the Krylov subspace. That is, if the CG method starts with <math>\mathbf{x}_0 = 0</math>, then<ref>{{Cite journal |last1=Paquette |first1=Elliot |last2=Trogdon |first2=Thomas |date=March 2023 |title=Universality for the Conjugate Gradient and MINRES Algorithms on Sample Covariance Matrices |url=https://onlinelibrary.wiley.com/doi/10.1002/cpa.22081 |journal=Communications on Pure and Applied Mathematics |language=en |volume=76 |issue=5 |pages=1085–1136 |doi=10.1002/cpa.22081 |issn=0010-3640|arxiv=2007.00640 }}</ref><math display="block">x_k = \mathrm{argmin}_{y \in \mathbb{R}^n} {\left\{(x-y)^{\top} A(x-y): y \in \operatorname{span}\left\{b, A b, \ldots, A^{k-1} b\right\}\right\}}</math>The algorithm is detailed below for solving <math>\mathbf{A} \mathbf{x}= \mathbf{b}</math> where <math>\mathbf{A}</math> is a real, symmetric, positive-definite matrix. The input vector <math>\mathbf{x}_0</math> can be an approximate initial solution or <math>\mathbf{0}</math>. It is a different formulation of the exact procedure described above. :<math>\begin{align} & \mathbf{r}_0 := \mathbf{b} - \mathbf{A x}_0 \\ & \hbox{if } \mathbf{r}_{0} \text{ is sufficiently small, then return } \mathbf{x}_{0} \text{ as the result}\\ & \mathbf{p}_0 := \mathbf{r}_0 \\ & k := 0 \\ & \text{repeat} \\ & \qquad \alpha_k := \frac{\mathbf{r}_k^\mathsf{T} \mathbf{r}_k}{\mathbf{p}_k^\mathsf{T} \mathbf{A p}_k} \\ & \qquad \mathbf{x}_{k+1} := \mathbf{x}_k + \alpha_k \mathbf{p}_k \\ & \qquad \mathbf{r}_{k+1} := \mathbf{r}_k - \alpha_k \mathbf{A p}_k \\ & \qquad \hbox{if } \mathbf{r}_{k+1} \text{ is sufficiently small, then exit loop} \\ & \qquad \beta_k := \frac{\mathbf{r}_{k+1}^\mathsf{T} \mathbf{r}_{k+1}}{\mathbf{r}_k^\mathsf{T} \mathbf{r}_k} \\ & \qquad \mathbf{p}_{k+1} := \mathbf{r}_{k+1} + \beta_k \mathbf{p}_k \\ & \qquad k := k + 1 \\ & \text{end repeat} \\ & \text{return } \mathbf{x}_{k+1} \text{ as the result} \end{align}</math> This is the most commonly used algorithm. The same formula for <math>\beta_k</math> is also used in the Fletcher–Reeves [[nonlinear conjugate gradient method]]. ====Restarts==== We note that <math>\mathbf{x}_{1}</math> is computed by the [[Gradient descent#Solution of a linear system|gradient descent]] method applied to <math>\mathbf{x}_{0}</math>. Setting <math>\beta_{k}=0</math> would similarly make <math>\mathbf{x}_{k+1}</math> computed by the [[Gradient descent#Solution of a linear system|gradient descent]] method from <math>\mathbf{x}_{k}</math>, i.e., can be used as a simple implementation of a restart of the conjugate gradient iterations.<ref name="BP" /> Restarts could slow down convergence, but may improve stability if the conjugate gradient method misbehaves, e.g., due to [[round-off error]]. ====Explicit residual calculation==== The formulas <math>\mathbf{x}_{k+1} := \mathbf{x}_k + \alpha_k \mathbf{p}_k</math> and <math>\mathbf{r}_k := \mathbf{b} - \mathbf{A x}_k</math>, which both hold in exact arithmetic, make the formulas <math>\mathbf{r}_{k+1} := \mathbf{r}_k - \alpha_k \mathbf{A p}_k</math> and <math>\mathbf{r}_{k+1} := \mathbf{b} - \mathbf{A x}_{k+1}</math> mathematically equivalent. The former is used in the algorithm to avoid an extra multiplication by <math>\mathbf{A}</math> since the vector <math>\mathbf{A p}_k</math> is already computed to evaluate <math>\alpha_k</math>. The latter may be more accurate, substituting the explicit calculation <math>\mathbf{r}_{k+1} := \mathbf{b} - \mathbf{A x}_{k+1}</math> for the implicit one by the recursion subject to [[round-off error]] accumulation, and is thus recommended for an occasional evaluation.<ref>{{cite book | first=Jonathan R | last=Shewchuk |title=An Introduction to the Conjugate Gradient Method Without the Agonizing Pain |year=1994 |url=http://www.cs.cmu.edu/~quake-papers/painless-conjugate-gradient.pdf }}</ref> A norm of the residual is typically used for stopping criteria. The norm of the explicit residual <math>\mathbf{r}_{k+1} := \mathbf{b} - \mathbf{A x}_{k+1}</math> provides a guaranteed level of accuracy both in exact arithmetic and in the presence of the [[rounding errors]], where convergence naturally stagnates. In contrast, the implicit residual <math>\mathbf{r}_{k+1} := \mathbf{r}_k - \alpha_k \mathbf{A p}_k</math> is known to keep getting smaller in amplitude well below the level of [[rounding errors]] and thus cannot be used to determine the stagnation of convergence. ====Computation of alpha and beta==== In the algorithm, <math>\alpha_k</math> is chosen such that <math>\mathbf{r}_{k+1}</math> is orthogonal to <math>\mathbf{r}_{k}</math>. The denominator is simplified from :<math>\alpha_k = \frac{\mathbf{r}_{k}^\mathsf{T} \mathbf{r}_{k}}{\mathbf{r}_{k}^\mathsf{T} \mathbf{A} \mathbf{p}_k} = \frac{\mathbf{r}_k^\mathsf{T} \mathbf{r}_k}{\mathbf{p}_k^\mathsf{T} \mathbf{A p}_k} </math> since <math>\mathbf{r}_{k+1} = \mathbf{p}_{k+1}-\mathbf{\beta}_{k}\mathbf{p}_{k}</math>. The <math>\beta_k</math> is chosen such that <math>\mathbf{p}_{k+1}</math> is conjugate to <math>\mathbf{p}_{k}</math>. Initially, <math>\beta_k</math> is :<math>\beta_k = - \frac{\mathbf{r}_{k+1}^\mathsf{T} \mathbf{A} \mathbf{p}_k}{\mathbf{p}_k^\mathsf{T} \mathbf{A} \mathbf{p}_k}</math> using :<math>\mathbf{r}_{k+1} = \mathbf{r}_{k} - \alpha_{k} \mathbf{A} \mathbf{p}_{k}</math> and equivalently <math> \mathbf{A} \mathbf{p}_{k} = \frac{1}{\alpha_{k}} (\mathbf{r}_{k} - \mathbf{r}_{k+1}), </math> the numerator of <math>\beta_k</math> is rewritten as :<math> \mathbf{r}_{k+1}^\mathsf{T} \mathbf{A} \mathbf{p}_k = \frac{1}{\alpha_k} \mathbf{r}_{k+1}^\mathsf{T} (\mathbf{r}_k - \mathbf{r}_{k+1}) = - \frac{1}{\alpha_k} \mathbf{r}_{k+1}^\mathsf{T} \mathbf{r}_{k+1} </math> because <math>\mathbf{r}_{k+1}</math> and <math>\mathbf{r}_{k}</math> are orthogonal by design. The denominator is rewritten as :<math> \mathbf{p}_k^\mathsf{T} \mathbf{A} \mathbf{p}_k = (\mathbf{r}_k + \beta_{k-1} \mathbf{p}_{k-1})^\mathsf{T} \mathbf{A} \mathbf{p}_k = \frac{1}{\alpha_k} \mathbf{r}_k^\mathsf{T} (\mathbf{r}_k - \mathbf{r}_{k+1}) = \frac{1}{\alpha_k} \mathbf{r}_k^\mathsf{T} \mathbf{r}_k </math> using that the search directions <math>\mathbf{p}_k</math> are conjugated and again that the residuals are orthogonal. This gives the <math>\beta</math> in the algorithm after cancelling <math>\alpha_k</math>. ====Example code in [[Julia (programming language)]]==== <syntaxhighlight lang="julia" line="1" start="1"> """ conjugate_gradient!(A, b, x) Return the solution to `A * x = b` using the conjugate gradient method. """ function conjugate_gradient!( A::AbstractMatrix, b::AbstractVector, x::AbstractVector; tol=eps(eltype(b)) ) # Initialize residual vector residual = b - A * x # Initialize search direction vector search_direction = copy(residual) # Compute initial squared residual norm norm(x) = sqrt(sum(x.^2)) old_resid_norm = norm(residual) # Iterate until convergence while old_resid_norm > tol A_search_direction = A * search_direction step_size = old_resid_norm^2 / (search_direction' * A_search_direction) # Update solution @. x = x + step_size * search_direction # Update residual @. residual = residual - step_size * A_search_direction new_resid_norm = norm(residual) # Update search direction vector @. search_direction = residual + (new_resid_norm / old_resid_norm)^2 * search_direction # Update squared residual norm for next iteration old_resid_norm = new_resid_norm end return x end </syntaxhighlight> ====Example code in [[MATLAB]]==== <syntaxhighlight lang="matlab" line="1" start="1"> function x = conjugate_gradient(A, b, x0, tol) % Return the solution to `A * x = b` using the conjugate gradient method. % Reminder: A should be symmetric and positive definite. if nargin < 4 tol = eps; end r = b - A * x0; p = r; rsold = r' * r; x = x0; while sqrt(rsold) > tol Ap = A * p; alpha = rsold / (p' * Ap); x = x + alpha * p; r = r - alpha * Ap; rsnew = r' * r; p = r + (rsnew / rsold) * p; rsold = rsnew; end end </syntaxhighlight>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)