Editing Kriging (section)

==Main principles==

===Related terms and techniques===
Kriging predicts the value of a function at a given point by computing a weighted average of the known values of the function in the neighborhood of the point. The method is closely related to [[regression analysis]]. Both theories derive a [[best linear unbiased estimator]] based on assumptions on [[covariance]]s, make use of [[Gauss–Markov theorem]] to prove independence of the estimate and error, and use very similar formulae. Even so, they are useful in different frameworks: kriging is made for estimation of a single realization of a random field, while regression models are based on multiple observations of a multivariate data set.

The kriging estimation may also be seen as a [[spline (mathematics)|spline]] in a [[reproducing kernel Hilbert space]], with the reproducing kernel given by the covariance function.<ref>{{cite book |first=Grace |last=Wahba |title=Spline Models for Observational Data |publisher=SIAM |volume=59 |year=1990 |doi=10.1137/1.9781611970128 |isbn=978-0-89871-244-5 }}</ref> The difference with the classical kriging approach is provided by the interpretation: while the spline is motivated by a minimum-norm interpolation based on a Hilbert-space structure, kriging is motivated by an expected squared prediction error based on a stochastic model.

Kriging with ''polynomial trend surfaces'' is mathematically identical to [[generalized least squares]] polynomial [[curve fitting]].

Kriging can also be understood as a form of [[Bayesian optimization]].<ref>{{Cite book |last1=Williams |first1=C. K. I. |chapter=Prediction with Gaussian Processes: From Linear Regression to Linear Prediction and Beyond |doi=10.1007/978-94-011-5014-9_23 |title=Learning in Graphical Models |pages=599–621 |year=1998 |isbn=978-94-010-6104-9}}</ref> Kriging starts with a [[prior probability distribution|prior]] [[probability distribution|distribution]] over [[Function (mathematics)|functions]]. This prior takes the form of a Gaussian process: <math>N</math> samples from a function will be [[normal distribution|normally distributed]], where the [[covariance]] between any two samples is the covariance function (or [[kernel (statistics)|kernel]]) of the Gaussian process evaluated at the spatial location of two points. A [[Set (mathematics)|set]] of values is then observed, each value associated with a spatial location. Now, a new value can be predicted at any new spatial location by combining the Gaussian prior with a Gaussian [[likelihood function]] for each of the observed values. The resulting [[Posterior probability|posterior]] distribution is also Gaussian, with a mean and covariance that can be simply computed from the observed values, their variance, and the kernel matrix derived from the prior.

===Geostatistical estimator===
In geostatistical models, sampled data are interpreted as the result of a random process. The fact that these models incorporate uncertainty in their conceptualization does not mean that the phenomenon – the forest, the aquifer, the mineral deposit – has resulted  from a random process, but rather it allows one to build a methodological basis for the spatial inference of quantities in unobserved locations and to quantify the uncertainty associated with the estimator.

A [[stochastic process]] is, in the context of this model, simply a way to approach the set of data collected from the samples. The first step in geostatistical modulation is to create a random process that best describes the set of observed data.

A value from location <math>x_1</math> (generic denomination of a set of [[Geographic coordinate system|geographic coordinates]]) is interpreted as a realization <math>z(x_1)</math> of the [[random variable]] <math>Z(x_1)</math>. In the space <math>A</math>, where the set of samples is dispersed, there are <math>N</math> realizations of the random variables <math>Z(x_1), Z(x_2), \ldots, Z(x_N)</math>, correlated between themselves.

The set of random variables constitutes a random function, of which only one realization is known – the set <math>z(x_i)</math> of observed data. With only one realization of each random variable, it's theoretically impossible to determine any [[statistical parameter]] of the individual variables or the function. The proposed solution in the geostatistical formalism consists in ''assuming'' various degrees of ''stationarity'' in the random function, in order to make the inference of some statistic values possible.

For instance, if one assumes, based on the homogeneity of samples in area <math>A</math> where the variable is distributed, the hypothesis that the [[Moment (mathematics)#Mean|first moment]] is stationary (i.e. all random variables have the same mean), then one is assuming that the mean can be estimated by the arithmetic mean of sampled values.

The hypothesis of stationarity related to the [[Moment (mathematics)#Variance|second moment]] is defined in the following way: the correlation between two random variables solely depends on the spatial distance between them and is independent of their location.  Thus if <math>\mathbf{h} = x_2 - x_1</math> and <math>h = |\mathbf{h}|</math>, then:

: <math>C\big(Z(x_1), Z(x_2)\big) = C\big(Z(x_i), Z(x_i + \mathbf{h})\big) = C(h),</math>

: <math>\gamma\big(Z(x_1), Z(x_2)\big) = \gamma\big(Z(x_i), Z(x_i + \mathbf{h})\big) = \gamma(h).</math>

For simplicity, we define <math>C(x_i, x_j) = C\big(Z(x_i), Z(x_j)\big)</math> and <math>\gamma(x_i, x_j) = \gamma\big(Z(x_i), Z(x_j)\big)</math>.

This hypothesis allows one to infer those two measures – the [[variogram]] and the [[covariogram]]:

: <math>\gamma(h) = \frac{1}{2|N(h)|} \sum_{(i,j)\in N(h)} \big(Z(x_i) - Z(x_j)\big)^2,</math>

: <math>C(h) = \frac{1}{|N(h)|} \sum_{(i,j)\in N(h)} \big(Z(x_i) - m(h)\big)\big(Z(x_j) - m(h)\big),</math>

where:
: <math>m(h) = \frac{1}{2|N(h)|} \sum_{(i,j)\in N(h)} Z(x_i) + Z(x_j)</math>;
: <math>N(h)</math> denotes the set of pairs of observations <math>i,\;j</math> such that <math>|x_i - x_j| = h</math>, and <math>|N(h)|</math> is the number of pairs in the set.
In this set, <math>(i,\;j)</math> and <math>(j,\;i)</math> denote the same element. Generally an "approximate distance" <math>h</math> is used, implemented using a certain tolerance.

===Linear estimation===
Spatial inference, or estimation, of a quantity <math>Z \colon \mathbb{R}^n \to \mathbb{R}</math>, at an unobserved location <math>x_0</math>, is calculated from a linear combination of the observed values <math>z_i = Z(x_i)</math> and weights <math>w_i(x_0),\; i = 1, \ldots, N</math>:

: <math>
 \hat{Z}(x_0) =
 \begin{bmatrix}
   w_1 & w_2 & \cdots & w_N
 \end{bmatrix}
 \begin{bmatrix}
  z_1 \\
  z_2 \\
  \vdots \\
  z_N
 \end{bmatrix} =
 \sum_{i=1}^N w_i(x_0) Z(x_i).
</math>

The weights <math>w_i</math> are intended to summarize two extremely important procedures in a spatial inference process:
* reflect the structural "proximity" of samples to the estimation location <math>x_0</math>;
* at the same time, they should have a desegregation effect, in order to avoid bias caused by eventual sample ''clusters''.

When calculating the weights <math>w_i</math>, there are two objectives in the geostatistical formalism: ''unbias'' and ''minimal variance of estimation''.

If the cloud of real values <math>Z(x_0)</math> is plotted against the estimated values <math>\hat{Z}(x_0)</math>, the criterion for global unbias, ''intrinsic stationarity'' or [[stationary process|wide sense stationarity]] of the field, implies that the mean of the estimations must be equal to mean of the real values.

The second criterion says that the mean of the squared deviations <math>\big(\hat{Z}(x) - Z(x)\big)</math> must be minimal, which means that when the cloud of estimated values ''versus'' the cloud real values is more disperse, the estimator is more imprecise.