Editing Conjugate gradient method (section)

==Convergence properties==
The conjugate gradient method can theoretically be viewed as a direct method, as in the absence of [[round-off error]] it produces the exact solution after a finite number of iterations, which is not larger than the size of the matrix. In practice, the exact solution is never obtained since the conjugate gradient method is unstable with respect to even small perturbations, e.g., most directions are not in practice conjugate, due to a degenerative nature of generating the Krylov subspaces.

As an [[iterative method]], the conjugate gradient method monotonically (in the energy norm) improves approximations <math>\mathbf{x}_{k}</math> to the exact solution and may reach the required tolerance after a relatively small (compared to the problem size) number of iterations. The improvement is typically linear and its speed is determined by the [[condition number]] <math>\kappa(A)</math> of the system matrix <math>A</math>: the larger <math>\kappa(A)</math> is, the slower the improvement.<ref name=saad1996iterative>{{cite book|last=Saad|first=Yousef|title=Iterative methods for sparse linear systems|year=2003|publisher=Society for Industrial and Applied Mathematics|location=Philadelphia, Pa.|isbn=978-0-89871-534-7|pages=[https://archive.org/details/iterativemethods0000saad/page/195 195]|edition=2nd|url=https://archive.org/details/iterativemethods0000saad/page/195}}</ref>

However, an interesting case appears when the eigenvalues are spaced logarithmically for a large symmetric matrix. For example, let <math>A = Q D Q^T</math> where <math>Q</math> is a random orthogonal matrix and <math>D</math> is a diagonal matrix with eigenvalues ranging from <math>\lambda_n = 1</math> to <math>\lambda_1 = 10^6</math>, spaced logarithmically. Despite the finite termination property of CGM, where the exact solution should theoretically be reached in at most <math>n</math> steps, the method may exhibit stagnation in convergence. In such a scenario, even after many more iterations—e.g., ten times the matrix size—the error may only decrease modestly (e.g., to <math>10^{-5}</math>). Moreover, the iterative error may oscillate significantly, making it unreliable as a stopping condition. This poor convergence is not explained by the condition number alone (e.g., <math>\kappa_2(A) = 10^6</math>), but rather by the eigenvalue distribution itself. When the eigenvalues are more evenly spaced or randomly distributed, such convergence issues are typically absent, highlighting that CGM performance depends not only on <math>\kappa(A)</math> but also on how the eigenvalues are distributed.<ref>{{Cite book | author1=Holmes, M.| title=Introduction to Scientific Computing and Data Analysis, 2nd Ed | year=2023 | publisher=Springer | isbn=978-3-031-22429-4}}</ref>

If <math>\kappa(A)</math> is large,  [[preconditioning]] is commonly used to replace the original system <math>\mathbf{A x}-\mathbf{b} = 0</math> with <math>\mathbf{M}^{-1}(\mathbf{A x}-\mathbf{b}) = 0</math> such that <math>\kappa(\mathbf{M}^{-1}\mathbf{A})</math> is smaller than <math>\kappa(\mathbf{A})</math>, see below.

=== Convergence theorem ===

Define a subset of polynomials as
:<math>
  \Pi_k^* := \left\lbrace \ p \in \Pi_k \ : \ p(0)=1 \ \right\rbrace \,,
</math>
where <math> \Pi_k </math> is the set of [[Polynomial ring|polynomials]] of maximal degree <math> k </math>.

Let <math> \left( \mathbf{x}_k \right)_k </math> be the iterative approximations of the exact solution <math> \mathbf{x}_* </math>, and define the errors as <math> \mathbf{e}_k := \mathbf{x}_k - \mathbf{x}_* </math>.
Now, the rate of convergence can be approximated as <ref name="BP" /><ref>{{Cite book |title=Iterative solution of large sparse systems of equations |last=Hackbusch |first=W. |isbn=978-3-319-28483-5 |edition=2nd |location=Switzerland |publisher=Springer |oclc=952572240|date=2016-06-21 }}</ref>
:<math>
\begin{align}
  \left\| \mathbf{e}_k \right\|_\mathbf{A}
  &= \min_{p \in \Pi_k^*}  \left\| p(\mathbf{A}) \mathbf{e}_0 \right\|_\mathbf{A}
  \\
  &\leq \min_{p \in \Pi_k^*} \,  \max_{ \lambda \in \sigma(\mathbf{A})} | p(\lambda) | \  \left\|  \mathbf{e}_0 \right\|_\mathbf{A}
  \\
  &\leq 2 \left( \frac{ \sqrt{\kappa(\mathbf{A})}-1 }{ \sqrt{\kappa(\mathbf{A})}+1 } \right)^k \ \left\|  \mathbf{e}_0 \right\|_\mathbf{A}
  \\
  &\leq 2 \exp\left(\frac{-2k}{\sqrt{\kappa(\mathbf{A})}}\right) \ \left\|  \mathbf{e}_0 \right\|_\mathbf{A}
  \,,
\end{align}
</math>
where <math> \sigma(\mathbf{A}) </math> denotes the [[Spectrum of a matrix|spectrum]], and <math> \kappa(\mathbf{A}) </math> denotes the [[condition number]].

This shows <math>k = \tfrac{1}{2}\sqrt{\kappa(\mathbf{A})} \log\left(\left\|  \mathbf{e}_0 \right\|_\mathbf{A} \varepsilon^{-1}\right)</math> iterations suffices to reduce the error to <math>2\varepsilon</math> for any <math>\varepsilon>0</math>.

Note, the important limit when <math> \kappa(\mathbf{A}) </math> tends to <math> \infty </math>
:<math>
  \frac{ \sqrt{\kappa(\mathbf{A})}-1 }{ \sqrt{\kappa(\mathbf{A})}+1 }
  \approx 1 - \frac{2}{\sqrt{\kappa(\mathbf{A})}}
  \quad \text{for} \quad
  \kappa(\mathbf{A}) \gg 1
  \,.
</math>
This limit shows a faster convergence rate compared to the iterative methods of [[Jacobi method|Jacobi]] or [[Gauss–Seidel method|Gauss–Seidel]] which scale as <math> \approx 1 - \frac{2}{\kappa(\mathbf{A})} </math>.

No [[round-off error]] is assumed in the convergence theorem, but the convergence bound is commonly valid in practice as theoretically explained<ref name="AG" /> by [[Anne Greenbaum]].

=== Practical convergence ===

If initialized randomly, the first stage of iterations is often the fastest, as the error is eliminated within the Krylov subspace that initially reflects a smaller effective condition number. The second stage of convergence is typically well defined by the theoretical convergence bound with <math display="inline"> \sqrt{\kappa(\mathbf{A})}</math>, but may be super-linear, depending on a distribution of the spectrum of the matrix <math>A</math> and the spectral distribution of the error.<ref name="AG" /> In the last stage, the smallest attainable accuracy is reached and the convergence stalls or the method may even start diverging. In typical scientific computing applications in [[double-precision floating-point format]] for matrices of large sizes, the conjugate gradient method uses a stopping criterion with a tolerance that terminates the iterations during the first or second stage.