Editing Inverse problem (section)

==<span id="Linear inverse problem"></span>Linear inverse problems==
In the case of a linear forward map and when we deal with a finite number of model parameters, the forward map can be written as a [[linear system]]
<math display="block"> d = Fp</math>
where <math>F</math> is the [[matrix (mathematics)|matrix]] that characterizes the forward map. The linear system can be solved by means of both regularization and Bayesian methods.<ref>{{cite journal | last1 = Huang | first1 = Yunfei. | display-authors = etal | year = 2019 | title = Traction force microscopy with optimized regularization and automated Bayesian parameter selection for comparing cells | journal = Scientific Reports | volume = 9 | number = 1| page = 537 | doi = 10.1038/s41598-018-36896-x | pmid = 30679578 | doi-access = free | pmc = 6345967 | arxiv = 1810.05848 | bibcode = 2019NatSR...9..539H }}</ref> 

===An elementary example: Earth's gravitational field===
Only a few physical systems are actually linear with respect to the model parameters. One such system from geophysics is that of the [[Gravity of Earth|Earth's gravitational field]]. The Earth's gravitational field is determined by the density distribution of the Earth in the subsurface. Because the [[lithology]] of the Earth changes quite significantly, we are able to observe minute differences in the Earth's gravitational field on the surface of the Earth. From our understanding of gravity (Newton's Law of Gravitation), we know that the mathematical expression for gravity is:
<math display="block">d= \frac{G p}{r^2};</math>
here <math>d</math> is a measure of the local gravitational acceleration, <math>G</math> is the [[Gravitational acceleration|universal gravitational constant]], <math>p</math> is the local mass (which is related to density) of the rock in the subsurface and <math>r</math> is the distance from the mass to the observation point.

By discretizing the above expression, we are able to relate the discrete data observations on the surface of the Earth to the discrete model parameters (density) in the subsurface that we wish to know more about. For example, consider the case where we have measurements carried out at 5 locations on the surface of the Earth. In this case, our data vector, <math>d</math> is a column vector of dimension (5×1): its <math>i</math>-th component is associated with the <math>i</math>-th observation location.  We also know that we only have five unknown masses <math>p_j</math> in the subsurface (unrealistic but used to demonstrate the concept) with known location: we denote by <math>r_{ij}</math> the distance between the <math>i</math>-th observation location and the <math>j</math>-th mass. Thus, we can construct the linear system relating the five unknown masses to the five data points as follows:
<math display="block">d = F p, </math>
<math display="block">d =
\begin{bmatrix} d_1 \\ d_2 \\ d_3 \\ d_4 \\ d_5 \end{bmatrix}, \quad
p = \begin{bmatrix} p_1 \\ p_2 \\ p_3 \\ p_4 \\ p_5 \end{bmatrix},</math>
<math display="block">F = \begin{bmatrix}
\frac{G}{r_{11}^2} & \frac{G}{r_{12}^2} & \frac{G}{r_{13}^2} & \frac{G}{r_{14}^2} & \frac{G}{r_{15}^2} \\
\frac{G}{r_{21}^2} & \frac{G}{r_{22}^2} & \frac{G}{r_{23}^2} & \frac{G}{r_{24}^2} & \frac{G}{r_{25}^2} \\
\frac{G}{r_{31}^2} & \frac{G}{r_{32}^2} & \frac{G}{r_{33}^2} & \frac{G}{r_{34}^2} & \frac{G}{r_{35}^2} \\
\frac{G}{r_{41}^2} & \frac{G}{r_{42}^2} & \frac{G}{r_{43}^2} & \frac{G}{r_{44}^2} & \frac{G}{r_{45}^2} \\
\frac{G}{r_{51}^2} & \frac{G}{r_{52}^2} & \frac{G}{r_{53}^2} & \frac{G}{r_{54}^2} & \frac{G}{r_{55}^2}
\end{bmatrix}</math>

To solve for the model parameters that fit our data, we might be able to invert the matrix <math>F</math> to directly convert the measurements into our model parameters. For example:
<math display="block">p = F^{-1} d_\text{obs} </math>
A system with five equations and five unknowns is a very specific situation: our example was designed to end up with this specificity. In general, the numbers of data and unknowns are different so that matrix <math>F</math> is not square.

However, even a square matrix can have no inverse: matrix <math>F</math> can be [[Rank (linear algebra)|rank]] deficient (i.e. has zero eigenvalues) and the solution of the system <math>p = F^{-1} d_\text{obs} </math> is not unique. Then the solution of the inverse problem will be undetermined. This is a first difficulty. Over-determined systems (more equations than unknowns) have other issues.
Also noise may corrupt our observations making <math>d</math> possibly outside the space <math>F(P)</math> of possible responses to model parameters so that solution of the system <math>p = F^{-1} d_\text{obs} </math> may not exist. This is another difficulty.

==== Tools to overcome the first difficulty ====
The first difficulty reflects a crucial problem: Our observations do not contain enough information and additional data are required. Additional data can come from physical '''prior information''' on the parameter values, on their spatial distribution or, more generally, on their mutual dependence. It can also come from other experiments: For instance, we may think of integrating data recorded by gravimeters and seismographs for a better estimation of densities. 
The integration of this additional information is basically a problem of [[statistics]]. This discipline is the one that can answer the question: How to mix quantities of different nature? We will be more precise in the section "Bayesian approach" below.

Concerning distributed parameters, prior information about their spatial distribution often consists of information about some derivatives of these distributed parameters. Also, it is common practice, although somewhat artificial, to look for the "simplest" model that reasonably matches the data. This is usually achieved by [[Penalty method|penalizing]] the [[Lp space|<math>L^1</math> norm]] of the gradient (or the [[total variation]]) of the parameters (this approach is also referred to as the maximization of the entropy). One can also make the model simple through a parametrization that introduces degrees of freedom only when necessary.

Additional information may also be integrated through inequality constraints on the model parameters or some functions of them. Such constraints are important to avoid unrealistic values for the parameters (negative values for instance). In this case, the space spanned by model parameters will no longer be a vector space but a '''subset of admissible models''' denoted by <math>P_\text{adm}</math> in the sequel.

==== Tools to overcome the second difficulty ====
As mentioned above, noise may be such that our measurements are not the image of any model, so that we cannot look for a model that produces the data but rather look for [[model selection|the best (or optimal) model]]: that is, the one that best matches the data. This leads us to minimize an [[objective function]], namely a [[functional (mathematics)|functional]]  that quantifies how big the residuals are or how far the predicted data are from the observed data. Of course, when we have perfect data (i.e. no noise) then the recovered model should fit the observed data perfectly. A standard objective function, <math>\varphi</math>, is of the form:
<math>\varphi(p) = \|F p-d_\text{obs} \|^2 </math>
where <math>\| \cdot \| </math> is the Euclidean norm (it will be the [[Lp space|<math>L^2</math> norm]] when the measurements are functions instead of samples) of the residuals. This approach amounts to making use of [[Least squares|ordinary least squares]], an approach widely used in statistics. However, the Euclidean norm is known to be very sensitive to outliers: to avoid this difficulty we may think of using other distances, for instance the <math>L^1</math> norm, in replacement of the <math>L^2</math> norm.

==== Bayesian approach ====
Very similar to the least-squares approach is the probabilistic approach: If we know the statistics of the noise that contaminates the data, we can think of seeking the most likely model m, which is the model that matches the [[Maximum likelihood estimation|maximum likelihood criterion]]. If the noise is [[Normal distribution|Gaussian]], the maximum likelihood criterion appears as a least-squares criterion, the Euclidean scalar product in data space being replaced by a scalar product involving the [[Covariance|co-variance]] of the noise. Also, should prior information on model parameters be available, we could think of using [[Bayesian inference]] to formulate the solution of the inverse problem. This approach is described in detail in Tarantola's book.<ref>{{cite book |last1=Tarantola |first1=Albert |title=Inverse problem theory |url=https://archive.org/details/inverseproblemth0000tara |url-access=registration |date=1987 |publisher=Elsevier |isbn=9780444599674 |edition=1st}}</ref>

==== Numerical solution of our elementary example ====
Here we make use of the Euclidean norm to quantify the data misfits. As we deal with a linear inverse problem, the objective function is quadratic. For its minimization, it is classical to compute its gradient using the same rationale (as we would to minimize a function of only one variable). At the optimal model <math>p_\text{opt}</math>, this gradient vanishes which can be written as:
<math display="block">\nabla_p \varphi = 2 (F^\mathrm{T} F p_\text{opt} - F^\mathrm{T} d_\text{obs}) = 0 </math>
where ''F''<sup>T</sup> denotes the [[matrix transpose]] of ''F''. This equation simplifies to:
<math display="block">F^\mathrm{T} F p_\text{opt} = F^\mathrm{T} d_\text{obs} </math>

This expression is known as the [https://en.wikipedia.org/?title=Normal_equations&redirect=no normal equation] and gives us a possible solution to the inverse problem. 
In our example matrix <math>F^\mathrm{T} F</math> turns out to be generally full rank so that the equation above makes sense and determines uniquely the model parameters: we do not need integrating additional information for ending up with a unique solution.

=== Mathematical and computational aspects ===

Inverse problems are typically ill-posed, as opposed to the [[well-posed problem]]s usually met in mathematical modeling. Of the three conditions for a [[well-posed problem]] suggested by [[Jacques Hadamard]] (existence, uniqueness, and stability of the solution or solutions) the condition of stability is most often violated. In the sense of [[functional analysis]], the inverse problem is represented by a mapping between [[metric space]]s. While inverse problems are often formulated in infinite dimensional spaces, limitations to a finite number of measurements, and the practical consideration of recovering only a finite number of unknown parameters, may lead to the problems being recast in discrete form. In this case the inverse problem will typically be ''[[condition number|ill-conditioned]]''. In these cases, [[regularization (mathematics)|regularization]] may be used to introduce mild assumptions on the solution and prevent [[overfitting]]. Many instances of regularized inverse problems can be interpreted as special cases of [[Bayesian inference]].<ref>{{cite book| chapter-url=http://www.ipgp.fr/~tarantola/Files/Professional/Books/InverseProblemTheory.pdf| title=Inverse Problem Theory and Methods for Model Parameter Estimation| pages=i–xii| first=Albert|last=Tarantola| publisher=SIAM| doi=10.1137/1.9780898717921.fm| chapter=Front Matter| year=2005| isbn=978-0-89871-572-9}}</ref>

==== Numerical solution of the optimization problem ====
Some inverse problems have a very simple solution, for instance, when one has a set of [[unisolvent functions]], meaning a set of {{tmath|n}} functions such that evaluating them at {{tmath|n}} distinct points yields a set of [[linearly independent]] vectors. This means that given a linear combination of these functions, the coefficients can be computed by arranging the vectors as the columns of a matrix and then inverting this matrix. The simplest example of unisolvent functions is polynomials constructed, using the [[unisolvence theorem]], so as to be unisolvent. Concretely, this is done by inverting the [[Vandermonde matrix]]. But this a very specific situation.

In general, the solution of an inverse problem requires sophisticated optimization algorithms. When the model is described by a large number of parameters (the number of unknowns involved in some diffraction tomography applications can reach one billion), solving the linear system associated with the normal equations can be cumbersome. The numerical method to be used for solving the optimization problem depends in particular on the cost required for computing the solution  <math>F p</math> of the forward problem. Once chosen the appropriate algorithm for solving the forward problem (a straightforward matrix-vector multiplication may be not adequate when matrix <math>F</math> is huge), the appropriate algorithm for carrying out the minimization can be found in textbooks dealing with numerical methods for the solution of linear systems and for the minimization of quadratic functions (see for instance Ciarlet<ref>{{cite book |last1=Ciarlet |first1=Philippe |title=Introduction à l'analyse numérique matricielle et à l'optimisation |date=1994 |publisher=Masson |location=Paris |isbn=9782225688935}}</ref> or Nocedal<ref>{{cite book |last1=Nocedal |first1=Jorge |title=Numerical optimization |date=2006 |publisher=Springer}}</ref>).

Also, the user may wish to add physical constraints to the models: In this case, they have to be familiar with [[Constrained optimization|constrained optimization methods]], a subject in itself. In all cases, computing the gradient of the objective function often is a key element for the solution of the optimization problem. As mentioned above, information about the spatial distribution of a distributed parameter can be introduced through the parametrization. One can also think of adapting this parametrization during the optimization.<ref>{{cite journal |last1=Ben Ameur |first1=Hend |last2=Chavent |first2=Guy |last3=Jaffré |first3=Jérôme |title=Refinement and coarsening indicators for adaptive parametrization: application to the estimation of hydraulic transmissivities |journal=Inverse Problems |date=2002 |volume=18 |issue=3 |pages=775–794 |url=https://hal.inria.fr/docs/00/07/22/95/PDF/RR-4292.pdf|doi=10.1088/0266-5611/18/3/317 |bibcode=2002InvPr..18..775B |s2cid=250892174 }}</ref>

Should the objective function be based on a norm other than the Euclidean norm, we have to leave the area of quadratic optimization. As a result, the optimization problem becomes more difficult. In particular, when the <math>L^1</math> norm is used for quantifying the data misfit the objective function is no longer differentiable: its gradient does not make sense any longer. Dedicated methods (see for instance Lemaréchal<ref>{{cite book |last1=Lemaréchal |first1=Claude |title=Optimization, Handbooks in Operations Research and Management Science |date=1989 |publisher=Elsevier |pages=529–572}}</ref>) from non differentiable optimization come in.

Once the optimal model is computed we have to address the question: "Can we trust this model?" The question can be formulated as follows: How large is the set of models that match the data "nearly as well" as this model? In the case of quadratic objective functions, this set is contained in a hyper-ellipsoid, a subset of <math>R^M</math> (<math>M</math> is the number of unknowns), whose size depends on what we mean with "nearly as well", that is on the noise level. The direction of the largest axis of this ellipsoid ([[Eigenvalues and eigenvectors|eigenvector]] associated with the smallest eigenvalue of matrix <math>F^T F</math>) is the direction of poorly determined components: if we follow this direction, we can bring a strong perturbation to the model without changing significantly the value of the objective function and thus end up with a significantly different quasi-optimal model. We clearly see that the answer to the question "can we trust this model" is governed by the noise level and by the eigenvalues of the [[Hessian matrix|Hessian]] of the objective function or equivalently, in the case where no regularization has been integrated, by the [[singular value]]s of matrix <math>F</math>. Of course, the use of regularization (or other kinds of prior information) reduces the size of the set of almost optimal solutions and, in turn, increases the confidence we can put in the computed solution.

Recent contributions from the field of [[Algorithmic information theory]] have proposed a more general approach to such problems, including a noteworthy conceptual framework for extracting generative rules from complex [[Dynamical system|dynamical systems]].<ref>{{cite journal |last1=Zenil |first1=Hector |last2=Kiani |first2=Narsis A. |last3=Zea |first3=Allan A. |last4=Tegnér |first4=Jesper |title=Causal deconvolution by algorithmic generative models |journal=Nature Machine Intelligence |volume=1 |issue=1 |year=2019 |pages=58-66 |doi=10.1038/s42256-018-0005-0 }}</ref>; In particular, Zenil et al. (2019) introduced a framework called Algorithmic Information Dynamics (AID) <ref>{{cite journal | last=Zenil | first=Hector | title=Algorithmic Information Dynamics | journal=Scholarpedia | date=25 July 2020 | volume=15 | issue=7 | doi=10.4249/scholarpedia.53143 | doi-access=free | bibcode=2020SchpJ..1553143Z | hdl=10754/666314 | hdl-access=free }}</ref> which quantifies the algorithmic complexity of system components through controlled perturbation analysis.<ref>{{cite book | last1=Zenil | first1=Hector | last2=Kiani | first2=Narsis A. | last3=Tegner | first3=Jesper | title=Algorithmic Information Dynamics: A Computational Approach to Causality with Applications to Living Systems | publisher=Cambridge University Press | year=2023 | doi=10.1017/9781108596619 | isbn=978-1-108-59661-9 | url=https://doi.org/10.1017/9781108596619}}</ref> This method enables the reconstruction of generative rules in discrete dynamical systems—such as [[Cellular automaton | cellular automata]]—without relying on explicit governing equations. <ref>{{cite journal |last1=Zenil |first1=Hector |last2=Kiani |first2=N. A. |last3=Marabita |first3=F. |last4=Deng |first4=Y. |last5=Elias |first5=S. |last6=Schmidt |first6=A. |last7=Ball |first7=G. |last8=Tegnér |first8=J. |title=An Algorithmic Information Calculus for Causal Discovery and Reprogramming Systems |journal=Science |year=2019 |doi=10.1016/j.sci.2019.30270-6 }}</ref>  By analyzing the algorithmic responses of system states to localized changes, AID provides a novel lens for identifying causal relationships and estimating the reprogrammability of a system toward desired behaviors. This approach offers a useful alternative when classical inverse methods struggle with instability or intractability in highly discrete or nonlinear domains.

==== Stability, regularization and model discretization in infinite dimension ====

We focus here on the recovery of a distributed parameter.
When looking for distributed parameters we have to discretize these unknown functions. Doing so, we reduce the dimension of the problem to something finite. But now, the question is: is there any link between the solution we compute and that of the initial problem? Then another question: what do we mean with the solution of the initial problem? Since a finite number of data does not allow the determination of an infinity of unknowns, the original data misfit functional has to be regularized to ensure the uniqueness of the solution. Many times, reducing the unknowns to a finite-dimensional space will provide an adequate regularization: the computed solution will look like a discrete version of the solution we were looking for. For example, a naïve discretization will often work for solving the [[deconvolution]] problem: it will work as long as we do not allow missing frequencies to show up in the numerical solution. But many times, regularization has to be integrated explicitly in the objective function.

In order to understand what may happen, we have to keep in mind that solving such a linear inverse problem amount to solving a Fredholm integral equation of the first kind: 
<math display="block">d(x) = \int_\Omega K(x,y) p(y) dy</math>

where <math>K</math> is the kernel, <math>x</math> and <math>y</math> are vectors of <math>R^2</math>, and <math>\Omega</math> is a domain in <math>R^2</math>. This holds for a 2D application. For a 3D application, we consider <math> x,y \in R^3</math>. Note that here the model parameters <math>p</math> consist of a function and that the response of a model also consists of a function denoted by <math>d(x)</math>. This equation is an extension to infinite dimension of the matrix equation <math>d=Fp</math> given in the case of discrete problems.

For sufficiently smooth <math>K</math> the operator defined above is [[compact operator|compact]] on reasonable [[Banach space]]s such as the [[Lp space|<math>L^2</math>]]. [[Compact operator|F. Riesz theory]] states that the set of singular values of such an operator contains zero (hence the existence of a null-space), is finite or at most countable, and, in the latter case, they constitute a sequence that goes to zero. In the case of a symmetric kernel, we have an infinity of eigenvalues and the associated eigenvectors constitute a hilbertian basis of <math>L^2</math>.  Thus any solution of this equation is determined up to an additive function in the null-space and, in the case of infinity of singular values, the solution (which involves the reciprocal of arbitrary small eigenvalues) is unstable: two ingredients that make the solution of this integral equation a typical ill-posed problem! However, we can define a solution through the [[Generalized inverse|pseudo-inverse]] of the forward map (again up to an arbitrary additive function). When the forward map is compact, the classical [[Tikhonov regularization]] will work if we use it for integrating prior information stating that the <math>L^2</math> norm of the solution should be as small as possible: this will make the inverse problem well-posed. Yet, as in the finite dimension case, we have to question the confidence we can put in the computed solution. Again, basically, the information lies in the eigenvalues of the Hessian operator. Should subspaces containing eigenvectors associated with small eigenvalues be explored for computing the solution, then the solution can hardly be trusted: some of its components will be poorly determined. The smallest eigenvalue is equal to the weight introduced in Tikhonov regularization.

Irregular kernels may yield a forward map which is not compact and even [[Unbounded operator|unbounded]] if we naively equip the space of models with the <math>L^2</math> norm. In such cases, the Hessian is not a bounded operator and the notion of eigenvalue does not make sense any longer. A mathematical analysis is required to make it a [[bounded operator]] and design a well-posed problem: an illustration can be found in.<ref>{{cite journal |last1=Delprat-Jannaud |first1=Florence |last2=Lailly |first2=Patrick |title=Ill-posed and well-posed formulations of the reflection travel time tomography problem|journal=Journal of Geophysical Research |date=1993 |volume=98 |issue=B4 |pages=6589–6605 |doi=10.1029/92JB02441 |bibcode=1993JGR....98.6589D }}</ref> Again, we have to question the confidence we can put in the computed solution and we have to generalize the notion of eigenvalue to get the answer.<ref>{{cite journal |last1=Delprat-Jannaud |first1=Florence |last2=Lailly |first2=Patrick |title=What information on the Earth model do reflection traveltimes provide |journal=Journal of Geophysical Research |date=1992 |volume=98 |issue=B13 |pages=827–844|doi=10.1029/92JB01739 |bibcode=1992JGR....9719827D }}</ref>

Analysis of the spectrum of the Hessian operator is thus a key element to determine how reliable the computed solution is.  However, such an analysis is usually a very heavy task. This has led several authors to investigate alternative approaches in the case where we are not interested in all the components of the unknown function but only in sub-unknowns that are the images of the unknown function by a linear operator. These approaches are referred to as the " Backus and Gilbert method<ref>{{cite journal |last1=Backus |first1=George |last2=Gilbert |first2=Freeman |title=The Resolving Power of Gross Earth Data |journal=Geophysical Journal of the Royal Astronomical Society |date=1968 |volume=16 |issue=10 |pages=169–205 |doi=10.1111/j.1365-246X.1968.tb00216.x |bibcode=1968GeoJ...16..169B |doi-access=free }}</ref>", [[Jacques-Louis Lions|Lions]]'s sentinels approach,<ref>{{cite journal |last1=Lions |first1=Jacques Louis |title=Sur les sentinelles des systèmes distribués |journal=C. R. Acad. Sci. Paris |date=1988 |series=I Math |pages=819–823}}</ref> and the SOLA method:<ref>{{cite journal |last1=Pijpers |first1=Frank |last2=Thompson |first2=Michael |title=The SOLA method for helioseismic inversion |journal=Astronomy and Astrophysics |date=1993 |volume=281 |issue=12 |pages=231–240|bibcode=1994A&A...281..231P }}</ref> these approaches turned out to be strongly related with one another as explained in Chavent<ref>{{cite book |last1=Chavent |first1=Guy |title=Least-Squares, Sentinels and Substractive Optimally Localized Average in Equations aux dérivées partielles et applications |date=1998 |publisher=Gauthier Villars |location=Paris |pages=345–356 |url=https://hal.inria.fr/inria-00073357/document}}</ref>   Finally, the concept of [[Optical resolution|limited resolution]], often invoked by physicists, is nothing but a specific view of the fact that some poorly determined components may corrupt the solution. But, generally speaking, these poorly determined components of the model are not necessarily associated with high frequencies.

===Some classical linear inverse problems for the recovery of distributed parameters===
The problems mentioned below correspond to different versions of the Fredholm integral: each of these is associated with a specific kernel <math>K</math>.

====Deconvolution====
The goal of [[deconvolution]] is to reconstruct the original image or signal <math>p(x)</math> which appears as noisy and blurred on the data <math>d(x)</math>.<ref>Kaipio, J., & Somersalo, E. (2010). Statistical and computational inverse problems. New York, NY: Springer.</ref>
From a mathematical point of view, the kernel <math>K(x,y)</math> here only depends on the difference between <math>x</math> and <math>y</math>.

====Tomographic methods====
In these methods we attempt at recovering a distributed parameter, the observation consisting in the measurement of the integrals of this parameter carried out along a family of lines. We denote by <math>\Gamma_x</math> the line in this family associated with measurement point <math>x</math>. The observation at <math>x</math> can thus be written as:
<math display="block">d(x) = \int_{\Gamma_x} w(x,y) p(y) \, dy</math>
where <math>s</math> is the arc-length along <math>{\Gamma_x}</math> and <math>w(x,y)</math> a known weighting function. Comparing this equation with the Fredholm integral above, we notice that the kernel <math>K(x,y)</math> is kind of a [[Dirac delta function|delta function]] that peaks on line <math>{\Gamma_x}</math>. With such a kernel, the forward map is not compact.

===== Computed tomography =====
In [[X-ray computed tomography]] the lines on which the parameter is integrated are straight lines: the [[tomographic reconstruction]] of the parameter distribution is based on the inversion of the [[Radon transform]]. Although from a theoretical point of view many linear inverse problems are well understood, problems involving the Radon transform and its generalisations still present many theoretical challenges with questions of sufficiency of data still unresolved. Such problems include incomplete data for the x-ray transform in three dimensions and problems involving the generalisation of the x-ray transform to tensor fields. Solutions explored include [[Algebraic Reconstruction Technique]], [[filtered backprojection]], and as computing power has increased, [[iterative reconstruction]] methods such as [[SAMV (algorithm)|iterative Sparse Asymptotic Minimum Variance]].<ref name=AbeidaZhang>{{cite journal | last1=Abeida | first1=Habti | last2=Zhang | first2=Qilin | last3=Li | first3=Jian | last4=Merabtine | first4=Nadjim | title=Iterative Sparse Asymptotic Minimum Variance Based Approaches for Array Processing | journal=IEEE Transactions on Signal Processing | volume=61 | issue=4 | year=2013 | issn=1053-587X | doi=10.1109/tsp.2012.2231676 | pages=933–944 | url=https://qilin-zhang.github.io/_pages/pdfs/SAMVpaper.pdf | arxiv=1802.03070 | bibcode=2013ITSP...61..933A | s2cid=16276001 }}</ref>

===== Diffraction tomography =====
Diffraction tomography is a classical linear inverse problem in exploration seismology: the amplitude recorded at one time for a given source-receiver pair is the sum of contributions arising from points such that the sum of the distances, measured in traveltimes, from the source and the receiver, respectively, is equal to the corresponding recording time. In 3D the parameter is not integrated along lines but over surfaces. Should the propagation velocity be constant, such points are distributed on an ellipsoid. The inverse problems consists in retrieving the distribution of diffracting points from the seismograms recorded along the survey, the velocity distribution being known.  A direct solution has been originally proposed by [http://amath.colorado.edu/~beylkin/papers/BEYLKI-1983a.pdf Beylkin] and Lambaré et al.:<ref>{{cite journal |last1=Lambaré |first1=Gilles |last2=Virieux |first2=Jean |last3=Madariaga |first3=Raul |last4=Jin |first4=Side |title=Iterative asymptotic inversion in the acoustic approximation |journal=  Geophysics |date=1992 |volume=57 |issue=9 |pages=1138–1154 |doi=10.1190/1.1443328 |bibcode=1992Geop...57.1138L |s2cid=55836067 }}</ref> these works were the starting points of approaches known as amplitude preserved migration (see Beylkin<ref>{{cite journal |last1=Beylkin |first1=Gregory |title=The inversion problem and applications of The generalized Radon transform |journal=Communications on Pure and Applied Mathematics |date=1984 |volume=XXXVII |issue=5 |pages=579–599 |url=http://amath.colorado.edu/faculty/beylkin/papers/BEYLKI-1984.pdf |doi=10.1002/cpa.3160370503}}</ref><ref>{{cite journal |last1=Beylkin |first1=Gregory |title=Imaging of discontinuities in the inverse scaterring problem by inversion of a causal generalized Radon transform |journal=J. Math. Phys. |date=1985 |volume=26 |issue=1 |pages=99–108|doi=10.1063/1.526755 |bibcode=1985JMP....26...99B }}</ref> and Bleistein<ref>{{cite journal |last1=Bleistein |first1=Norman |title=On the imaging of reflectors in the earth |journal=Geophysics |date=1987 |volume=52 |issue=7 |pages=931–942 |doi=10.1190/1.1442363 |bibcode=1987Geop...52..931B |s2cid=5095133 }}</ref>).  Should geometrical optics techniques (i.e. [https://www.encyclopediaofmath.org/index.php/Ray_method rays])  be used for the solving the wave equation, these methods turn out to be closely related to the so-called least-squares migration methods<ref>{{cite journal |last1=Nemeth |first1=Tamas |last2=Wu |first2=Chengjun |last3=Schuster |first3=Gerard |title=Least-squares migration of incomplete reflection data |journal = Geophysics |date=1999 |volume=64 |issue=1 | pages=208–221 | url=https://csim.kaust.edu.sa/web/FWI&LSM_papers/LSM/Geophysics1999Nemeth.pdf |doi=10.1190/1.1444517 |bibcode=1999Geop...64..208N }}</ref> derived from the least-squares approach (see Lailly,<ref name='Proceedings of the international conference on "Inverse Scattering, theory and applications", Tulsa, Oklahoma'>{{cite book |last1=Lailly |first1=Patrick |title=The seismic inverse problem as a sequence of before stack migrations |date=1983 |publisher=SIAM |location=Philadelphia |isbn=0-89871-190-8 |pages=206–220}}</ref> Tarantola<ref>{{Cite journal | doi=10.1190/1.1441754| title=Inversion of seismic reflection data in the acoustic approximation| journal=Geophysics| volume=49| issue=8| pages=1259–1266| year=1984| last1=Tarantola| first1=Albert | bibcode=1984Geop...49.1259T| s2cid=7596552}}</ref>).

===== {{anchor|Doppler tomography}}Doppler tomography (astrophysics) =====

If we consider a rotating stellar object, the spectral lines we can observe on a spectral profile will be shifted due to Doppler effect. Doppler tomography aims at converting the information contained in spectral monitoring of the object into a 2D image of the emission (as a function of the radial velocity and of the phase in the periodic rotation movement) of the stellar atmosphere. As explained by [[Tom Marsh (astronomer)|Tom Marsh]]<ref>{{cite journal |last1=Marsh |first1=Tom |title=Doppler tomography |journal=Astrophysics and Space Science |volume=296 |date=2005 |issue=1–4 |pages=403–415 |doi=10.1007/s10509-005-4859-3 |arxiv=astro-ph/0011020 |bibcode=2005Ap&SS.296..403M |s2cid=15334110 }}</ref> this linear inverse problem is tomography like: we have to recover a distributed parameter which has been integrated along lines to produce its effects in the recordings.

==== Inverse heat conduction ====

Early publications on inverse heat conduction arose from determining surface heat flux during atmospheric re-entry from buried temperature sensors.<ref>{{Cite journal | author1 = Shumakov, N. V. | title = A method for the experimental study of the process of heating a solid body | journal = Soviet Physics –Technical Physics (Translated by American Institute of Physics) | volume = 2 | pages = 771 | year =  1957 }}</ref><ref>{{Cite journal | author1 = Stolz, G. Jr. | title =  Numerical solutions to an inverse problem of heat conduction for simple shapes | journal = Journal of Heat Transfer | volume =  82 | pages = 20–26 | year = 1960 | doi = 10.1115/1.3679871 }}</ref>
Other applications where surface heat flux is needed but surface sensors are not practical include: inside reciprocating engines, inside rocket engines; and, testing of nuclear reactor components.<ref>{{Cite book | author1 = Beck, J. V. | author2 =  Blackwell, B.  |author3 =  St. Clair, C. R. Jr. | title = Inverse Heat Conduction. Ill-Posed Problems | location =  New York | publisher = J. Wiley & Sons | year = 1985 | isbn =  0471083194 }}</ref> A variety of numerical techniques have been developed to address the ill-posedness and sensitivity to measurement error caused by damping and lagging in the temperature signal.<ref>{{cite journal | author1 = Beck, J. V. | author2 =  Blackwell, B. |author3 =  Haji-Sheikh, B. | title =  Comparison of some inverse heat conduction methods using experimental data | journal =  International Journal of Heat and Mass Transfer | volume = 39 | issue = 17 | year =  1996  | pages = 3649–3657 | doi =  10.1016/0017-9310(96)00034-8
| bibcode = 1996IJHMT..39.3649B }}</ref><ref>{{cite book | author1 = Ozisik, M. N. |author2 =  Orlande, H. R. B.
| title =  Inverse Heat Transfer, Fundamentals and Applications | publisher = CRC Press | year = 2021 | isbn = 9780367820671 | edition = 2nd }}</ref><ref>{{cite book | title = Inverse Engineering Handbook, edited by K. A. Woodbury | publisher = CRC Press | year = 2002 | isbn = 9780849308611 }}</ref>