Editing Computer experiment

{{Short description|Experiment used to study computer simulation}}
A '''computer experiment''' or '''simulation experiment''' is an experiment used to study a computer simulation, also referred to as an [[in silico]] system. This area includes [[computational physics]],  [[computational chemistry]], [[computational biology]] and other similar disciplines.

==Background==
[[Computer simulation]]s are constructed to emulate a physical system. Because these are meant to replicate some aspect of a system in detail, they often do not yield an analytic solution.  Therefore, methods such as [[discrete event simulation]]  or [[finite element]] solvers are used.   A [[computer model]] is used to make inferences about the system it replicates.   For example, [[climate models]] are often used because experimentation on an earth sized object is impossible.

==Objectives==
Computer experiments have been employed with many purposes in mind.  Some of those include:  
* [[Uncertainty quantification]]: Characterize the uncertainty present in a computer simulation arising from unknowns during the computer simulation's construction.
* [[Inverse problem]]s:  Discover the underlying properties of the system from the physical data.
* Bias correction:  Use physical data to correct for bias in the simulation.
* [[Data assimilation]]: Combine multiple simulations and physical data sources into a complete predictive model.
* [[Systems design]]:  Find inputs that result in optimal system performance measures.

==Computer simulation modeling==
Modeling of computer experiments typically uses a Bayesian framework. [[Bayesian statistics]] is an interpretation of the field of [[statistics]] where all evidence about the true state of the world is explicitly expressed in the form of [[probabilities]].  In the realm of computer experiments, the Bayesian interpretation would imply we must form a [[prior distribution]] that represents our prior belief on the structure of the computer model.  The use of this philosophy for computer experiments started in the 1980s and is nicely summarized by Sacks et al. (1989) [https://web.archive.org/web/20170918022130/https://projecteuclid.org/DPubS?service=UI&version=1.0&verb=Display&handle=euclid.ss%2F1177012413].  While the Bayesian approach is widely used, [[frequentist]] approaches have been recently discussed [http://www2.isye.gatech.edu/~jeffwu/publications/calibration-may1.pdf].

The basic idea of this framework is to model the computer simulation as an unknown function of a set of inputs.  The computer simulation is implemented as a piece of computer code that can be evaluated to produce a collection of outputs.  Examples of inputs to these simulations are coefficients in the underlying model, [[initial conditions]] and [[Forcing function (differential equations)|forcing functions]].  It is natural to see the simulation as a deterministic function that maps these ''inputs'' into a collection of ''outputs''.  On the basis of seeing our simulator this way, it is common to refer to the collection of inputs as  <math>x</math>, the computer simulation itself as <math>f</math>, and the resulting output as <math>f(x)</math>.  Both <math>x</math> and <math>f(x)</math> are vector quantities, and they can be very large collections of values, often indexed by space, or by time, or by both space and time.

Although <math>f(\cdot)</math> is known in principle, in practice this is not the case.  Many simulators comprise tens of thousands of lines of high-level computer code, which is not accessible to intuition.  For some simulations, such as climate models, evaluation of the output for a single set of inputs can require millions of computer hours [http://amstat.tandfonline.com/doi/abs/10.1198/TECH.2009.0015#.UbixC_nFWHQ].

===Gaussian process prior===
The typical model for a computer code output is a Gaussian process.  For notational simplicity, assume <math> f(x) </math> is a scalar.  Owing to the Bayesian framework, we fix our belief that the function <math>f</math> follows a [[Gaussian process]],
<math>f \sim \operatorname{GP}(m(\cdot),C(\cdot,\cdot)),</math>
where <math> m</math> is the mean function and <math> C </math> is the covariance function.  Popular mean functions are low order polynomials and a popular [[covariance function]] is [[Matern covariance]], which includes both the exponential (<math> \nu = 1/2 </math>) and Gaussian covariances (as <math> \nu \rightarrow \infty </math>).

==Design of computer experiments==
The design of computer experiments has considerable differences from [[design of experiments]] for parametric models.   Since a Gaussian process prior has an infinite dimensional representation, the concepts of A and D criteria (see [[Optimal design]]), which focus on reducing the error in the parameters, cannot be used. Replications would also be wasteful in cases when the computer simulation has no error.   Criteria that are used to determine a good experimental design include integrated mean squared prediction error [https://web.archive.org/web/20170918022130/https://projecteuclid.org/DPubS?service=UI&version=1.0&verb=Display&handle=euclid.ss%2F1177012413] and distance based criteria [http://www.sciencedirect.com/science/article/pii/037837589090122B].

Popular strategies for design include [[latin hypercube sampling]] and [[low discrepancy sequences]].

===Problems with massive sample sizes===
Unlike physical experiments, it is common for computer experiments to have thousands of different input combinations.  Because the standard inference requires [[Invertible matrix|matrix inversion]] of a square matrix of the size of the number of samples (<math>n</math>), the cost grows on the <math> \mathcal{O} (n^3) </math>.  Matrix inversion of large, dense matrices can also cause numerical inaccuracies.  Currently, this problem is solved by greedy decision tree techniques, allowing effective computations for unlimited dimensionality and sample size [https://patents.google.com/patent/WO2013055257A1/en patent WO2013055257A1], or avoided by using approximation methods, e.g. [https://wayback.archive-it.org/all/20120130182750/http://www.stat.wisc.edu/~zhiguang/Multistep_AOS.pdf].

==See also==
*[[Simulation]]
*[[Uncertainty quantification]]
*[[Bayesian statistics]]
*[[Gaussian process emulator]]
*[[Design of experiments]]
*[[Molecular dynamics]]
*[[Monte Carlo method]]
*[[Surrogate model]]
*[[Grey box completion and validation]]
*[[Artificial financial market]]

==Further reading==

* {{cite book | last = Santner | first = Thomas | title = The Design and Analysis of Computer Experiments | publisher = Springer | location = Berlin | year = 2003 | isbn = 0-387-95420-1 }}

* {{cite journal | last1 = Fehr | first1 = Jörg | last2 = Heiland | first2 = Jan | last3 = Himpe | first3 = Christian | last4 = Saak | first4 = Jens | title = Best practices for replicability, reproducibility and reusability of computer-based experiments exemplified by model reduction software | journal = AIMS Mathematics | volume = 1 | issue = 3 | pages = 261–281 | date = 2016 | doi = 10.3934/Math.2016.3.261 | arxiv = 1607.01191 | s2cid = 14715031 }}

[[Category:Computational science]]
[[Category:Design of experiments]]
[[Category:Simulation]]