Editing Stochastic programming (section)

=== Statistical inference ===

Consider the following stochastic programming problem

<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>
\min\limits_{x\in X}\{ g(x) = f(x)+E[Q(x,\xi)] \}
</math></div>

Here <math>X</math> is a nonempty closed subset of <math>\mathbb{R}^n</math>, <math>\xi</math> is a random vector whose probability distribution <math>P</math> is supported on a set <math>\Xi \subset \mathbb{R}^d</math>, and <math>Q: X \times \Xi \rightarrow \mathbb{R}</math>. In the framework of two-stage stochastic programming, <math>Q(x,\xi)</math> is given by the optimal value of the corresponding second-stage problem.

Assume that <math>g(x)</math> is well defined and ''finite valued'' for all <math>x\in X</math>. This implies that for every <math>x\in X</math> the value <math>Q(x,\xi)</math> is finite almost surely.

Suppose that we have a sample <math>\xi^1,\dots,\xi^N</math> of <math>N</math>realizations of the random vector <math>\xi</math>. This random sample can be viewed as historical data of <math>N</math> observations of <math>\xi</math>, or it can be generated by Monte Carlo sampling techniques. Then we can formulate a corresponding ''sample average approximation''

<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>
\min\limits_{x\in X}\{ \hat{g}_N(x) = f(x)+\frac{1}{N} \sum_{j=1}^N Q(x,\xi^j) \}
</math></div>

By the [[law of large numbers]] we have that, under some regularity conditions <math>\frac{1}{N} \sum_{j=1}^N Q(x,\xi^j)</math> converges pointwise with probability 1 to <math>E[Q(x,\xi)]</math> as <math>N \rightarrow \infty</math>. Moreover, under mild additional conditions the convergence is uniform. We also have <math>E[\hat{g}_N(x)]=g(x)</math>, i.e., <math>\hat{g}_N(x)</math> is an ''unbiased'' estimator of <math>g(x)</math>. Therefore, it is natural to expect that the optimal value and optimal solutions of the SAA problem converge to their counterparts of the true problem as <math>N \rightarrow \infty</math>.

====Consistency of SAA estimators====

Suppose the feasible set <math>X</math> of the SAA problem is fixed, i.e., it is independent of the sample. Let <math>\vartheta^*</math> and <math>S^*</math> be the optimal value and the set of optimal solutions, respectively, of the true problem and let <math>\hat{\vartheta}_N</math> and <math>\hat{S}_N</math> be the optimal value and the set of optimal solutions, respectively, of the SAA problem.

# Let <math>g: X \rightarrow \mathbb{R}</math> and <math>\hat{g}_N: X \rightarrow \mathbb{R}</math> be a sequence of (deterministic) real valued functions. The following two properties are equivalent:
#* for any <math>\overline{x}\in X</math> and any sequence <math>\{x_N\}\subset X</math> converging to <math>\overline{x}</math> it follows that <math>\hat{g}_N(x_N)</math> converges to <math>g(\overline{x})</math>
#* the function <math>g(\cdot)</math> is continuous on <math>X</math> and <math>\hat{g}_N(\cdot)</math> converges to <math>g(\cdot)</math> uniformly on any compact subset of <math>X</math>
# If the objective of the SAA problem <math>\hat{g}_N(x)</math> converges to the true problem's objective <math>g(x)</math> with probability 1, as <math>N \rightarrow \infty</math>, uniformly on the feasible set <math>X</math>. Then <math>\hat{\vartheta}_N</math> converges to <math>\vartheta^*</math> with probability 1 as <math>N \rightarrow \infty</math>.
# Suppose that there exists a compact set <math>C \subset \mathbb{R}^n</math> such that
#* the set <math>S</math> of optimal solutions of the true problem is nonempty and is contained in <math>C</math>
#* the function <math>g(x)</math> is finite valued and continuous on <math>C</math>
#* the sequence of functions <math>\hat{g}_N(x)</math> converges to <math>g(x)</math> with probability 1, as <math>N \rightarrow \infty</math>, uniformly in <math>x\in C</math>
#* for <math>N</math> large enough the set <math>\hat{S}_N</math> is nonempty and <math>\hat{S}_N \subset C</math> with probability 1
:: then <math>\hat{\vartheta}_N \rightarrow \vartheta^*</math> and <math>\mathbb{D}(S^*,\hat{S}_N)\rightarrow 0 </math> with probability 1 as <math>N\rightarrow \infty </math>. Note that <math>\mathbb{D}(A,B) </math> denotes the ''deviation of set <math>A</math> from set <math>B</math>'', defined as

<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>
\mathbb{D}(A,B) := \sup_{x\in A} \{ \inf_{x' \in B} \|x-x'\| \}
</math></div>

In some situations the feasible set <math>X</math> of the SAA problem is estimated, then the corresponding SAA problem takes the form

<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>
\min_{x\in X_N} \hat{g}_N(x)
</math></div>

where <math>X_N</math> is a subset of <math>\mathbb{R}^n</math> depending on the sample and therefore is random. Nevertheless, consistency results for SAA estimators can still be derived under some additional assumptions:
# Suppose that there exists a compact set <math>C \subset \mathbb{R}^n</math> such that
#* the set <math>S</math> of optimal solutions of the true problem is nonempty and is contained in <math>C</math>
#* the function <math>g(x)</math> is finite valued and continuous on <math>C</math>
#* the sequence of functions <math>\hat{g}_N(x)</math> converges to <math>g(x)</math> with probability 1, as <math>N \rightarrow \infty</math>, uniformly in <math>x\in C</math>
#* for <math>N</math> large enough the set <math>\hat{S}_N</math> is nonempty and <math>\hat{S}_N \subset C</math> with probability 1
#* if <math> x_N \in X_N</math> and <math> x_N </math> converges with probability 1 to a point <math> x</math>, then <math> x \in X</math>
#* for some point <math> x \in S^*</math> there exists a sequence <math> x_N \in X_N</math> such that <math> x_N \rightarrow x</math> with probability 1.
:: then <math>\hat{\vartheta}_N \rightarrow \vartheta^*</math> and <math>\mathbb{D}(S^*,\hat{S}_N)\rightarrow 0 </math> with probability 1 as <math>N\rightarrow \infty </math>.

==== Asymptotics of the SAA optimal value ====

Suppose the sample <math>\xi^1,\dots,\xi^N</math> is i.i.d. and fix a point <math>x \in X</math>. Then the sample average estimator <math>\hat{g}_N(x)</math>, of <math>g(x)</math>, is unbiased and has variance <math>\frac{1}{N}\sigma^2(x)</math>, where <math>\sigma^2(x):=Var[Q(x,\xi)]</math> is supposed to be finite. Moreover, by the [[central limit theorem]] we have that

<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>
\sqrt{N} [\hat{g}_N-  g(x)] \xrightarrow{\mathcal{D}} Y_x
</math></div>

where <math>\xrightarrow{\mathcal{D}}</math> denotes convergence in ''distribution'' and <math>Y_x</math> has a normal distribution with mean <math>0</math> and variance <math>\sigma^2(x)</math>, written as <math>\mathcal{N}(0,\sigma^2(x))</math>.

In other words, <math>\hat{g}_N(x)</math> has ''asymptotically normal'' distribution, i.e., for large <math>N</math>, <math>\hat{g}_N(x)</math> has approximately normal distribution with mean <math>g(x)</math> and variance <math>\frac{1}{N}\sigma^2(x)</math>. This leads to the following (approximate) <math>100(1-\alpha)</math>% confidence interval for <math>f(x)</math>:

<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"> <math>
\left[ \hat{g}_N(x)-z_{\alpha/2} \frac{\hat{\sigma}(x)}{\sqrt{N}}, \hat{g}_N(x)+z_{\alpha/2} \frac{\hat{\sigma}(x)}{\sqrt{N}}\right]
</math></div>

where <math>z_{\alpha/2}:=\Phi^{-1}(1-\alpha/2)</math> (here <math>\Phi(\cdot)</math> denotes the cdf of the standard normal distribution) and

<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>
\hat{\sigma}^2(x) := \frac{1}{N-1}\sum_{j=1}^{N} \left[ Q(x,\xi^j)-\frac{1}{N} \sum_{j=1}^N Q(x,\xi^j) \right]^2
</math></div>

is the sample variance estimate of <math>\sigma^2(x)</math>. That is, the error of estimation of <math>g(x)</math> is (stochastically) of order <math> O(\sqrt{N})</math>.