Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Gaussian process
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
===Gaussian process prediction, or Kriging=== {{further|Kriging}} [[File:Gaussian Process Regression.png|thumbnail|right|Gaussian Process Regression (prediction) with a squared exponential kernel. Left plot are draws from the prior function distribution. Middle are draws from the posterior. Right is mean prediction with one standard deviation shaded.]] When concerned with a general Gaussian process regression problem (Kriging), it is assumed that for a Gaussian process <math>f</math> observed at coordinates <math>x</math>, the vector of values {{tmath|f(x)}} is just one sample from a multivariate Gaussian distribution of dimension equal to number of observed coordinates {{tmath|n}}. Therefore, under the assumption of a zero-mean distribution, {{tmath|f (x') \sim N (0, K(\theta,x,x'))}}, where {{tmath|K(\theta,x,x')}} is the covariance matrix between all possible pairs {{tmath|(x,x')}} for a given set of hyperparameters ''θ''.<ref name= "gpml"/> As such the log marginal likelihood is: <math display="block">\log p(f(x')\mid\theta,x) = -\frac{1}{2} \left(f(x)^\mathsf{T} K(\theta,x,x')^{-1} f(x') + \log \det(K(\theta,x,x')) + n \log 2\pi \right)</math> and maximizing this marginal likelihood towards {{mvar|θ}} provides the complete specification of the Gaussian process {{math|''f''}}. One can briefly note at this point that the first term corresponds to a penalty term for a model's failure to fit observed values and the second term to a penalty term that increases proportionally to a model's complexity. Having specified {{mvar|θ}}, making predictions about unobserved values {{tmath|f(x^*)}} at coordinates {{math|''x''*}} is then only a matter of drawing samples from the predictive distribution <math>p(y^*\mid x^*,f(x),x) = N(y^*\mid A,B)</math> where the posterior mean estimate {{mvar|A}} is defined as <math display="block">A = K(\theta,x^*,x) K(\theta,x,x')^{-1} f(x)</math> and the posterior variance estimate ''B'' is defined as: <math display="block">B = K(\theta,x^*,x^*) - K(\theta,x^*,x) K(\theta,x,x')^{-1} K(\theta,x^*,x)^\mathsf{T} </math> where {{tmath|K(\theta,x^*,x)}} is the covariance between the new coordinate of estimation ''x''* and all other observed coordinates ''x'' for a given hyperparameter vector {{mvar|θ}}, {{tmath|K(\theta,x,x')}} and {{tmath|f(x)}} are defined as before and {{tmath|K(\theta,x^*,x^*)}} is the variance at point {{math|''x''*}} as dictated by {{mvar|θ}}. It is important to note that practically the posterior mean estimate of {{tmath|f(x^*)}} (the "point estimate") is just a linear combination of the observations {{tmath|f(x)}}; in a similar manner the variance of {{tmath|f(x^*)}} is actually independent of the observations {{tmath|f(x)}}. A known bottleneck in Gaussian process prediction is that the computational complexity of inference and likelihood evaluation is cubic in the number of points |''x''|, and as such can become unfeasible for larger data sets.<ref name= "brml"/><ref name="highDimBayesianGeostat">{{Cite journal |last1 = Banerjee| first1 = Sudipto | title= High-dimensional Bayesian Geostatistics |journal= Bayesian Analysis | year = 2017 | volume = 12 | issue = 2 | pages=583–614| doi= 10.1214/17-BA1056R | url=https://doi.org/10.1214/17-BA1056R | pmid = 29391920 | pmc = 5790125 }}</ref> Works on sparse Gaussian processes, that usually are based on the idea of building a ''representative set'' for the given process ''f'', try to circumvent this issue. <ref name="smolaSparse">{{cite journal |last1= Smola| first1= A.J.| last2=Schoellkopf | first2= B. |year= 2000 |title= Sparse greedy matrix approximation for machine learning |journal= Proceedings of the Seventeenth International Conference on Machine Learning| pages=911–918| citeseerx= 10.1.1.43.3153}}</ref><ref name="CsatoSparse">{{cite journal |last1= Csato| first1=L.| last2=Opper | first2= M. |year= 2002 |title= Sparse on-line Gaussian processes |journal= Neural Computation |number=3| volume= 14 | pages=641–668 | doi=10.1162/089976602317250933| pmid=11860686| citeseerx=10.1.1.335.9713| s2cid=11375333}}</ref><ref name="banerjeePredictiveProcess">{{Cite journal |last1 = Banerjee| first1 = Sudipto | last2=Gelfand | first2 = Alan E.| last3 = Finley | first3 = Andrew O. | last4 = Sang | first4 = Huiyan | title= Gaussian Predictive Process Models for large spatial datasets |journal= Journal of the Royal Statistical Society, Series B (Statistical Methodology) | year = 2008 | volume = 70 | issue = 4 | pages=825–848| doi=10.1111/j.1467-9868.2008.00663.x | url=https://doi.org/10.1111/j.1467-9868.2008.00663.x | pmid = 19750209 | pmc = 2741335}}</ref> The [[kriging]] method can be used in the latent level of a [[nonlinear mixed-effects model]] for a spatial functional prediction: this technique is called the latent kriging.<ref>{{Cite journal |last1=Lee|first1=Se Yoon |first2=Bani|last2=Mallick| title = Bayesian Hierarchical Modeling: Application Towards Production Results in the Eagle Ford Shale of South Texas|journal=Sankhya B|year=2021|volume=84 |pages=1–43 |doi=10.1007/s13571-020-00245-8|doi-access=free}}</ref> Other classes of scalable Gaussian process for analyzing massive datasets have emerged from the [[Vecchia approximation]] and Nearest Neighbor Gaussian Processes (NNGP).<ref name="DattaEtAl2016">{{cite journal|last1=Datta|first1=Abhirup|last2=Banerjee|first2=Sudipto|last3=Finley|first3=Andrew|last4=Gelfand|first4=Alan|title=Hierarchical Nearest-Neighbor Gaussian Process Models for Large Spatial Data|journal=Journal of the American Statistical Association|year=2016|volume=111|issue=514|pages=800–812|doi=10.1080/01621459.2015.1044091|pmid=29720777 |pmc=5927603 }}</ref><ref name = "highDimBayesianGeostat"></ref> Often, the covariance has the form <math display="inline">K(\theta, x,x') = \frac{1}{\sigma^2} \tilde{K}(\theta,x,x')</math>, where <math>\sigma^2</math> is a scaling parameter. Examples are the Matérn class covariance functions. If this scaling parameter <math>\sigma^2</math> is either known or unknown (i.e. must be marginalized), then the posterior probability, <math>p(\theta \mid D)</math>, i.e. the probability for the hyperparameters <math>\theta</math> given a set of data pairs <math>D</math> of observations of <math>x</math> and <math>f(x)</math>, admits an analytical expression.<ref>{{Cite journal| last1=Ranftl|first1=Sascha|last2=Melito|first2=Gian Marco|last3=Badeli|first3=Vahid|last4=Reinbacher-Köstinger|first4=Alice| last5=Ellermann|first5=Katrin|last6=von der Linden|first6=Wolfgang|date=2019-12-31|title=Bayesian Uncertainty Quantification with Multi-Fidelity Data and Gaussian Processes for Impedance Cardiography of Aortic Dissection|journal=Entropy| volume=22|issue=1| pages=58|doi=10.3390/e22010058|issn=1099-4300|pmc=7516489|pmid=33285833|bibcode=2019Entrp..22...58R |doi-access=free}}</ref>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)