Editing Inverse problem (section)

=== Mathematical and computational aspects ===

Inverse problems are typically ill-posed, as opposed to the [[well-posed problem]]s usually met in mathematical modeling. Of the three conditions for a [[well-posed problem]] suggested by [[Jacques Hadamard]] (existence, uniqueness, and stability of the solution or solutions) the condition of stability is most often violated. In the sense of [[functional analysis]], the inverse problem is represented by a mapping between [[metric space]]s. While inverse problems are often formulated in infinite dimensional spaces, limitations to a finite number of measurements, and the practical consideration of recovering only a finite number of unknown parameters, may lead to the problems being recast in discrete form. In this case the inverse problem will typically be ''[[condition number|ill-conditioned]]''. In these cases, [[regularization (mathematics)|regularization]] may be used to introduce mild assumptions on the solution and prevent [[overfitting]]. Many instances of regularized inverse problems can be interpreted as special cases of [[Bayesian inference]].<ref>{{cite book| chapter-url=http://www.ipgp.fr/~tarantola/Files/Professional/Books/InverseProblemTheory.pdf| title=Inverse Problem Theory and Methods for Model Parameter Estimation| pages=i–xii| first=Albert|last=Tarantola| publisher=SIAM| doi=10.1137/1.9780898717921.fm| chapter=Front Matter| year=2005| isbn=978-0-89871-572-9}}</ref>

==== Numerical solution of the optimization problem ====
Some inverse problems have a very simple solution, for instance, when one has a set of [[unisolvent functions]], meaning a set of {{tmath|n}} functions such that evaluating them at {{tmath|n}} distinct points yields a set of [[linearly independent]] vectors. This means that given a linear combination of these functions, the coefficients can be computed by arranging the vectors as the columns of a matrix and then inverting this matrix. The simplest example of unisolvent functions is polynomials constructed, using the [[unisolvence theorem]], so as to be unisolvent. Concretely, this is done by inverting the [[Vandermonde matrix]]. But this a very specific situation.

In general, the solution of an inverse problem requires sophisticated optimization algorithms. When the model is described by a large number of parameters (the number of unknowns involved in some diffraction tomography applications can reach one billion), solving the linear system associated with the normal equations can be cumbersome. The numerical method to be used for solving the optimization problem depends in particular on the cost required for computing the solution  <math>F p</math> of the forward problem. Once chosen the appropriate algorithm for solving the forward problem (a straightforward matrix-vector multiplication may be not adequate when matrix <math>F</math> is huge), the appropriate algorithm for carrying out the minimization can be found in textbooks dealing with numerical methods for the solution of linear systems and for the minimization of quadratic functions (see for instance Ciarlet<ref>{{cite book |last1=Ciarlet |first1=Philippe |title=Introduction à l'analyse numérique matricielle et à l'optimisation |date=1994 |publisher=Masson |location=Paris |isbn=9782225688935}}</ref> or Nocedal<ref>{{cite book |last1=Nocedal |first1=Jorge |title=Numerical optimization |date=2006 |publisher=Springer}}</ref>).

Also, the user may wish to add physical constraints to the models: In this case, they have to be familiar with [[Constrained optimization|constrained optimization methods]], a subject in itself. In all cases, computing the gradient of the objective function often is a key element for the solution of the optimization problem. As mentioned above, information about the spatial distribution of a distributed parameter can be introduced through the parametrization. One can also think of adapting this parametrization during the optimization.<ref>{{cite journal |last1=Ben Ameur |first1=Hend |last2=Chavent |first2=Guy |last3=Jaffré |first3=Jérôme |title=Refinement and coarsening indicators for adaptive parametrization: application to the estimation of hydraulic transmissivities |journal=Inverse Problems |date=2002 |volume=18 |issue=3 |pages=775–794 |url=https://hal.inria.fr/docs/00/07/22/95/PDF/RR-4292.pdf|doi=10.1088/0266-5611/18/3/317 |bibcode=2002InvPr..18..775B |s2cid=250892174 }}</ref>

Should the objective function be based on a norm other than the Euclidean norm, we have to leave the area of quadratic optimization. As a result, the optimization problem becomes more difficult. In particular, when the <math>L^1</math> norm is used for quantifying the data misfit the objective function is no longer differentiable: its gradient does not make sense any longer. Dedicated methods (see for instance Lemaréchal<ref>{{cite book |last1=Lemaréchal |first1=Claude |title=Optimization, Handbooks in Operations Research and Management Science |date=1989 |publisher=Elsevier |pages=529–572}}</ref>) from non differentiable optimization come in.

Once the optimal model is computed we have to address the question: "Can we trust this model?" The question can be formulated as follows: How large is the set of models that match the data "nearly as well" as this model? In the case of quadratic objective functions, this set is contained in a hyper-ellipsoid, a subset of <math>R^M</math> (<math>M</math> is the number of unknowns), whose size depends on what we mean with "nearly as well", that is on the noise level. The direction of the largest axis of this ellipsoid ([[Eigenvalues and eigenvectors|eigenvector]] associated with the smallest eigenvalue of matrix <math>F^T F</math>) is the direction of poorly determined components: if we follow this direction, we can bring a strong perturbation to the model without changing significantly the value of the objective function and thus end up with a significantly different quasi-optimal model. We clearly see that the answer to the question "can we trust this model" is governed by the noise level and by the eigenvalues of the [[Hessian matrix|Hessian]] of the objective function or equivalently, in the case where no regularization has been integrated, by the [[singular value]]s of matrix <math>F</math>. Of course, the use of regularization (or other kinds of prior information) reduces the size of the set of almost optimal solutions and, in turn, increases the confidence we can put in the computed solution.

Recent contributions from the field of [[Algorithmic information theory]] have proposed a more general approach to such problems, including a noteworthy conceptual framework for extracting generative rules from complex [[Dynamical system|dynamical systems]].<ref>{{cite journal |last1=Zenil |first1=Hector |last2=Kiani |first2=Narsis A. |last3=Zea |first3=Allan A. |last4=Tegnér |first4=Jesper |title=Causal deconvolution by algorithmic generative models |journal=Nature Machine Intelligence |volume=1 |issue=1 |year=2019 |pages=58-66 |doi=10.1038/s42256-018-0005-0 }}</ref>; In particular, Zenil et al. (2019) introduced a framework called Algorithmic Information Dynamics (AID) <ref>{{cite journal | last=Zenil | first=Hector | title=Algorithmic Information Dynamics | journal=Scholarpedia | date=25 July 2020 | volume=15 | issue=7 | doi=10.4249/scholarpedia.53143 | doi-access=free | bibcode=2020SchpJ..1553143Z | hdl=10754/666314 | hdl-access=free }}</ref> which quantifies the algorithmic complexity of system components through controlled perturbation analysis.<ref>{{cite book | last1=Zenil | first1=Hector | last2=Kiani | first2=Narsis A. | last3=Tegner | first3=Jesper | title=Algorithmic Information Dynamics: A Computational Approach to Causality with Applications to Living Systems | publisher=Cambridge University Press | year=2023 | doi=10.1017/9781108596619 | isbn=978-1-108-59661-9 | url=https://doi.org/10.1017/9781108596619}}</ref> This method enables the reconstruction of generative rules in discrete dynamical systems—such as [[Cellular automaton | cellular automata]]—without relying on explicit governing equations. <ref>{{cite journal |last1=Zenil |first1=Hector |last2=Kiani |first2=N. A. |last3=Marabita |first3=F. |last4=Deng |first4=Y. |last5=Elias |first5=S. |last6=Schmidt |first6=A. |last7=Ball |first7=G. |last8=Tegnér |first8=J. |title=An Algorithmic Information Calculus for Causal Discovery and Reprogramming Systems |journal=Science |year=2019 |doi=10.1016/j.sci.2019.30270-6 }}</ref>  By analyzing the algorithmic responses of system states to localized changes, AID provides a novel lens for identifying causal relationships and estimating the reprogrammability of a system toward desired behaviors. This approach offers a useful alternative when classical inverse methods struggle with instability or intractability in highly discrete or nonlinear domains.

==== Stability, regularization and model discretization in infinite dimension ====

We focus here on the recovery of a distributed parameter.
When looking for distributed parameters we have to discretize these unknown functions. Doing so, we reduce the dimension of the problem to something finite. But now, the question is: is there any link between the solution we compute and that of the initial problem? Then another question: what do we mean with the solution of the initial problem? Since a finite number of data does not allow the determination of an infinity of unknowns, the original data misfit functional has to be regularized to ensure the uniqueness of the solution. Many times, reducing the unknowns to a finite-dimensional space will provide an adequate regularization: the computed solution will look like a discrete version of the solution we were looking for. For example, a naïve discretization will often work for solving the [[deconvolution]] problem: it will work as long as we do not allow missing frequencies to show up in the numerical solution. But many times, regularization has to be integrated explicitly in the objective function.

In order to understand what may happen, we have to keep in mind that solving such a linear inverse problem amount to solving a Fredholm integral equation of the first kind: 
<math display="block">d(x) = \int_\Omega K(x,y) p(y) dy</math>

where <math>K</math> is the kernel, <math>x</math> and <math>y</math> are vectors of <math>R^2</math>, and <math>\Omega</math> is a domain in <math>R^2</math>. This holds for a 2D application. For a 3D application, we consider <math> x,y \in R^3</math>. Note that here the model parameters <math>p</math> consist of a function and that the response of a model also consists of a function denoted by <math>d(x)</math>. This equation is an extension to infinite dimension of the matrix equation <math>d=Fp</math> given in the case of discrete problems.

For sufficiently smooth <math>K</math> the operator defined above is [[compact operator|compact]] on reasonable [[Banach space]]s such as the [[Lp space|<math>L^2</math>]]. [[Compact operator|F. Riesz theory]] states that the set of singular values of such an operator contains zero (hence the existence of a null-space), is finite or at most countable, and, in the latter case, they constitute a sequence that goes to zero. In the case of a symmetric kernel, we have an infinity of eigenvalues and the associated eigenvectors constitute a hilbertian basis of <math>L^2</math>.  Thus any solution of this equation is determined up to an additive function in the null-space and, in the case of infinity of singular values, the solution (which involves the reciprocal of arbitrary small eigenvalues) is unstable: two ingredients that make the solution of this integral equation a typical ill-posed problem! However, we can define a solution through the [[Generalized inverse|pseudo-inverse]] of the forward map (again up to an arbitrary additive function). When the forward map is compact, the classical [[Tikhonov regularization]] will work if we use it for integrating prior information stating that the <math>L^2</math> norm of the solution should be as small as possible: this will make the inverse problem well-posed. Yet, as in the finite dimension case, we have to question the confidence we can put in the computed solution. Again, basically, the information lies in the eigenvalues of the Hessian operator. Should subspaces containing eigenvectors associated with small eigenvalues be explored for computing the solution, then the solution can hardly be trusted: some of its components will be poorly determined. The smallest eigenvalue is equal to the weight introduced in Tikhonov regularization.

Irregular kernels may yield a forward map which is not compact and even [[Unbounded operator|unbounded]] if we naively equip the space of models with the <math>L^2</math> norm. In such cases, the Hessian is not a bounded operator and the notion of eigenvalue does not make sense any longer. A mathematical analysis is required to make it a [[bounded operator]] and design a well-posed problem: an illustration can be found in.<ref>{{cite journal |last1=Delprat-Jannaud |first1=Florence |last2=Lailly |first2=Patrick |title=Ill-posed and well-posed formulations of the reflection travel time tomography problem|journal=Journal of Geophysical Research |date=1993 |volume=98 |issue=B4 |pages=6589–6605 |doi=10.1029/92JB02441 |bibcode=1993JGR....98.6589D }}</ref> Again, we have to question the confidence we can put in the computed solution and we have to generalize the notion of eigenvalue to get the answer.<ref>{{cite journal |last1=Delprat-Jannaud |first1=Florence |last2=Lailly |first2=Patrick |title=What information on the Earth model do reflection traveltimes provide |journal=Journal of Geophysical Research |date=1992 |volume=98 |issue=B13 |pages=827–844|doi=10.1029/92JB01739 |bibcode=1992JGR....9719827D }}</ref>

Analysis of the spectrum of the Hessian operator is thus a key element to determine how reliable the computed solution is.  However, such an analysis is usually a very heavy task. This has led several authors to investigate alternative approaches in the case where we are not interested in all the components of the unknown function but only in sub-unknowns that are the images of the unknown function by a linear operator. These approaches are referred to as the " Backus and Gilbert method<ref>{{cite journal |last1=Backus |first1=George |last2=Gilbert |first2=Freeman |title=The Resolving Power of Gross Earth Data |journal=Geophysical Journal of the Royal Astronomical Society |date=1968 |volume=16 |issue=10 |pages=169–205 |doi=10.1111/j.1365-246X.1968.tb00216.x |bibcode=1968GeoJ...16..169B |doi-access=free }}</ref>", [[Jacques-Louis Lions|Lions]]'s sentinels approach,<ref>{{cite journal |last1=Lions |first1=Jacques Louis |title=Sur les sentinelles des systèmes distribués |journal=C. R. Acad. Sci. Paris |date=1988 |series=I Math |pages=819–823}}</ref> and the SOLA method:<ref>{{cite journal |last1=Pijpers |first1=Frank |last2=Thompson |first2=Michael |title=The SOLA method for helioseismic inversion |journal=Astronomy and Astrophysics |date=1993 |volume=281 |issue=12 |pages=231–240|bibcode=1994A&A...281..231P }}</ref> these approaches turned out to be strongly related with one another as explained in Chavent<ref>{{cite book |last1=Chavent |first1=Guy |title=Least-Squares, Sentinels and Substractive Optimally Localized Average in Equations aux dérivées partielles et applications |date=1998 |publisher=Gauthier Villars |location=Paris |pages=345–356 |url=https://hal.inria.fr/inria-00073357/document}}</ref>   Finally, the concept of [[Optical resolution|limited resolution]], often invoked by physicists, is nothing but a specific view of the fact that some poorly determined components may corrupt the solution. But, generally speaking, these poorly determined components of the model are not necessarily associated with high frequencies.