Editing Inverse problem (section)

==== Numerical solution of the optimization problem ====
Some inverse problems have a very simple solution, for instance, when one has a set of [[unisolvent functions]], meaning a set of {{tmath|n}} functions such that evaluating them at {{tmath|n}} distinct points yields a set of [[linearly independent]] vectors. This means that given a linear combination of these functions, the coefficients can be computed by arranging the vectors as the columns of a matrix and then inverting this matrix. The simplest example of unisolvent functions is polynomials constructed, using the [[unisolvence theorem]], so as to be unisolvent. Concretely, this is done by inverting the [[Vandermonde matrix]]. But this a very specific situation.

In general, the solution of an inverse problem requires sophisticated optimization algorithms. When the model is described by a large number of parameters (the number of unknowns involved in some diffraction tomography applications can reach one billion), solving the linear system associated with the normal equations can be cumbersome. The numerical method to be used for solving the optimization problem depends in particular on the cost required for computing the solution  <math>F p</math> of the forward problem. Once chosen the appropriate algorithm for solving the forward problem (a straightforward matrix-vector multiplication may be not adequate when matrix <math>F</math> is huge), the appropriate algorithm for carrying out the minimization can be found in textbooks dealing with numerical methods for the solution of linear systems and for the minimization of quadratic functions (see for instance Ciarlet<ref>{{cite book |last1=Ciarlet |first1=Philippe |title=Introduction à l'analyse numérique matricielle et à l'optimisation |date=1994 |publisher=Masson |location=Paris |isbn=9782225688935}}</ref> or Nocedal<ref>{{cite book |last1=Nocedal |first1=Jorge |title=Numerical optimization |date=2006 |publisher=Springer}}</ref>).

Also, the user may wish to add physical constraints to the models: In this case, they have to be familiar with [[Constrained optimization|constrained optimization methods]], a subject in itself. In all cases, computing the gradient of the objective function often is a key element for the solution of the optimization problem. As mentioned above, information about the spatial distribution of a distributed parameter can be introduced through the parametrization. One can also think of adapting this parametrization during the optimization.<ref>{{cite journal |last1=Ben Ameur |first1=Hend |last2=Chavent |first2=Guy |last3=Jaffré |first3=Jérôme |title=Refinement and coarsening indicators for adaptive parametrization: application to the estimation of hydraulic transmissivities |journal=Inverse Problems |date=2002 |volume=18 |issue=3 |pages=775–794 |url=https://hal.inria.fr/docs/00/07/22/95/PDF/RR-4292.pdf|doi=10.1088/0266-5611/18/3/317 |bibcode=2002InvPr..18..775B |s2cid=250892174 }}</ref>

Should the objective function be based on a norm other than the Euclidean norm, we have to leave the area of quadratic optimization. As a result, the optimization problem becomes more difficult. In particular, when the <math>L^1</math> norm is used for quantifying the data misfit the objective function is no longer differentiable: its gradient does not make sense any longer. Dedicated methods (see for instance Lemaréchal<ref>{{cite book |last1=Lemaréchal |first1=Claude |title=Optimization, Handbooks in Operations Research and Management Science |date=1989 |publisher=Elsevier |pages=529–572}}</ref>) from non differentiable optimization come in.

Once the optimal model is computed we have to address the question: "Can we trust this model?" The question can be formulated as follows: How large is the set of models that match the data "nearly as well" as this model? In the case of quadratic objective functions, this set is contained in a hyper-ellipsoid, a subset of <math>R^M</math> (<math>M</math> is the number of unknowns), whose size depends on what we mean with "nearly as well", that is on the noise level. The direction of the largest axis of this ellipsoid ([[Eigenvalues and eigenvectors|eigenvector]] associated with the smallest eigenvalue of matrix <math>F^T F</math>) is the direction of poorly determined components: if we follow this direction, we can bring a strong perturbation to the model without changing significantly the value of the objective function and thus end up with a significantly different quasi-optimal model. We clearly see that the answer to the question "can we trust this model" is governed by the noise level and by the eigenvalues of the [[Hessian matrix|Hessian]] of the objective function or equivalently, in the case where no regularization has been integrated, by the [[singular value]]s of matrix <math>F</math>. Of course, the use of regularization (or other kinds of prior information) reduces the size of the set of almost optimal solutions and, in turn, increases the confidence we can put in the computed solution.

Recent contributions from the field of [[Algorithmic information theory]] have proposed a more general approach to such problems, including a noteworthy conceptual framework for extracting generative rules from complex [[Dynamical system|dynamical systems]].<ref>{{cite journal |last1=Zenil |first1=Hector |last2=Kiani |first2=Narsis A. |last3=Zea |first3=Allan A. |last4=Tegnér |first4=Jesper |title=Causal deconvolution by algorithmic generative models |journal=Nature Machine Intelligence |volume=1 |issue=1 |year=2019 |pages=58-66 |doi=10.1038/s42256-018-0005-0 }}</ref>; In particular, Zenil et al. (2019) introduced a framework called Algorithmic Information Dynamics (AID) <ref>{{cite journal | last=Zenil | first=Hector | title=Algorithmic Information Dynamics | journal=Scholarpedia | date=25 July 2020 | volume=15 | issue=7 | doi=10.4249/scholarpedia.53143 | doi-access=free | bibcode=2020SchpJ..1553143Z | hdl=10754/666314 | hdl-access=free }}</ref> which quantifies the algorithmic complexity of system components through controlled perturbation analysis.<ref>{{cite book | last1=Zenil | first1=Hector | last2=Kiani | first2=Narsis A. | last3=Tegner | first3=Jesper | title=Algorithmic Information Dynamics: A Computational Approach to Causality with Applications to Living Systems | publisher=Cambridge University Press | year=2023 | doi=10.1017/9781108596619 | isbn=978-1-108-59661-9 | url=https://doi.org/10.1017/9781108596619}}</ref> This method enables the reconstruction of generative rules in discrete dynamical systems—such as [[Cellular automaton | cellular automata]]—without relying on explicit governing equations. <ref>{{cite journal |last1=Zenil |first1=Hector |last2=Kiani |first2=N. A. |last3=Marabita |first3=F. |last4=Deng |first4=Y. |last5=Elias |first5=S. |last6=Schmidt |first6=A. |last7=Ball |first7=G. |last8=Tegnér |first8=J. |title=An Algorithmic Information Calculus for Causal Discovery and Reprogramming Systems |journal=Science |year=2019 |doi=10.1016/j.sci.2019.30270-6 }}</ref>  By analyzing the algorithmic responses of system states to localized changes, AID provides a novel lens for identifying causal relationships and estimating the reprogrammability of a system toward desired behaviors. This approach offers a useful alternative when classical inverse methods struggle with instability or intractability in highly discrete or nonlinear domains.