Editing Stochastic programming (section)

==Scenario-based approach==

=== Discretization ===
To solve the two-stage stochastic problem numerically, one often needs to assume that the random vector <math>\xi</math> has a finite number of possible realizations, called ''scenarios'', say <math>\xi_1,\dots,\xi_K</math>, with respective probability masses <math>p_1,\dots,p_K</math>. Then the expectation in the first-stage problem's objective function can be written as the summation:
<math display="block">
E[Q(x,\xi)]=\sum\limits_{k=1}^{K} p_kQ(x,\xi_k)
</math>
and, moreover, the two-stage problem can be formulated as one large linear programming problem (this is called the deterministic equivalent of the original problem, see section {{Section link||Deterministic equivalent of a stochastic problem}}).

When <math>\xi</math> has an infinite (or very large) number of possible realizations the standard approach is then to represent this distribution by scenarios. This approach raises three questions, namely:

# How to construct scenarios, see {{Section link||Scenario construction}};
# How to solve the deterministic equivalent. Optimizers such as [[CPLEX]], and [[GNU Linear Programming Kit|GLPK]] can solve large linear/nonlinear problems. The NEOS Server,<ref name="neos">{{Cite web|url=http://www.neos-server.org/neos/|title = NEOS Server for Optimization}}</ref> hosted at the [[University of Wisconsin, Madison]], allows free access to many modern solvers. The structure of a deterministic equivalent is particularly amenable to apply decomposition methods,<ref>{{cite book|first2=Alexander|last2=Shapiro|last1=Ruszczyński|first1=Andrzej|title=Stochastic Programming|publisher=[[Elsevier]]|year=2003|isbn=978-0444508546|series=Handbooks in Operations Research and Management Science|volume=10|location=Philadelphia|pages=700|author1-link=Andrzej Piotr Ruszczyński}}</ref> such as [[Benders' decomposition]] or scenario decomposition;
# How to measure quality of the obtained solution with respect to the "true" optimum.

These questions are not independent. For example, the number of scenarios constructed will affect both the tractability of the deterministic equivalent and the quality of the obtained solutions.

=== Stochastic linear programming===
A stochastic [[linear program]] is a specific instance of the classical two-stage stochastic program. A stochastic LP is built from a collection of multi-period linear programs (LPs), each having the same structure but somewhat different data. The <math>k^{th}</math> two-period LP, representing the <math>k^{th}</math> scenario, may be regarded as having the following form:

<math>	
\begin{array}{lccccccc}
\text{Minimize} & f^T x & + & g^T y & + & h_k^Tz_k &  &  \\ 	
\text{subject to} & Tx & + & Uy &  &  & = & r \\ 	
 &  &  & V_k y & + & W_kz_k & = & s_k \\ 
 & x & , & y & , & z_k & \geq & 0
\end{array}
</math>

The vectors <math>x</math> and <math>y</math> contain the first-period variables, whose values must be chosen immediately. The vector <math>z_k</math> contains all of the variables for subsequent periods. The constraints <math>Tx + Uy = r</math> involve only first-period variables and are the same in every scenario. The other constraints involve variables of later periods and differ in some respects from scenario to scenario, reflecting uncertainty about the future.

Note that solving the <math>k^{th}</math> two-period LP is equivalent to assuming the <math>k^{th}</math> scenario in the second period with no uncertainty. In order to incorporate uncertainties in the second stage, one should assign probabilities to different scenarios and solve the corresponding deterministic equivalent.

==== Deterministic equivalent of a stochastic problem====
With a finite number of scenarios, two-stage stochastic linear programs can be modelled as large linear programming problems. This formulation is often called the deterministic equivalent linear program, or abbreviated to deterministic equivalent. (Strictly speaking a deterministic equivalent is any mathematical program that can be used to compute the optimal first-stage decision, so these will exist for continuous probability distributions as well, when one can represent the second-stage cost in some closed form.)
For example, to form the deterministic equivalent to the above stochastic linear program, we assign a probability <math>p_k</math> to each scenario <math>k=1,\dots,K</math>. Then we can minimize the expected value of the objective, subject to the constraints from all scenarios:

<math>
\begin{array}{lccccccccccccc}
\text{Minimize} & f^\top x & + & g^\top y & + & p_1h_1^\top z_1 & + & p_2h_2^Tz_2 & + & \cdots & + & p_Kh_K^\top z_K &  &  \\ 
\text{subject to} & Tx & + & Uy &  &  &  &  &  &  &  &  & = & r \\ 
 &  &  & V_1 y & + & W_1z_1 &  &  &  &  &  &  & = & s_1 \\ 
 &  &  & V_2 y &  &  & + & W_2z_2 &  &  &  &  & = & s_2 \\ 
 &  &  & \vdots &  &  &  &  &  & \ddots &  &  &  & \vdots \\ 
 &  &  & V_Ky &  &  &  &  &  &  & + & W_Kz_K & = & s_K \\ 
 & x & , & y & , & z_1 & , & z_2 & , & \ldots & , & z_K & \geq & 0 \\ 
\end{array}
</math>

We have a different vector <math>z_k</math> of later-period variables for each scenario <math>k</math>. The first-period variables <math>x</math> and <math>y</math> are the same in every scenario, however, because we must make a decision for the first period before we know which scenario will be realized. As a result, the constraints involving just <math>x</math> and <math>y</math> need only be specified once, while the remaining constraints must be given separately for each scenario.

=== Scenario construction ===
In practice it might be possible to construct scenarios by eliciting experts' opinions on the future. The number of constructed scenarios should be relatively modest so that the obtained deterministic equivalent can be solved with reasonable computational effort. It is often claimed that a solution that is optimal using only a few scenarios provides more adaptable plans than one that assumes a single scenario only. In some cases such a claim could be verified by a simulation. In theory some measures of guarantee that an obtained solution solves the original problem with reasonable accuracy. Typically in applications only the ''first stage'' optimal solution <math>x^*</math> has a practical value since almost always a "true" realization of the random data will be different from the set of constructed (generated) scenarios.

Suppose <math>\xi</math> contains <math>d</math> independent random components, each of which has three possible realizations (for example, future realizations of each random parameters are classified as low, medium and high), then the total number of scenarios is <math>K=3^d</math>. Such ''exponential growth'' of the number of scenarios makes model development using expert opinion very difficult even for reasonable size <math>d</math>. The situation becomes even worse if some random components of <math>\xi</math> have continuous distributions.

====Monte Carlo sampling and Sample Average Approximation (SAA) Method====

A common approach to reduce the scenario set to a manageable size is by using Monte Carlo simulation. Suppose the total number of scenarios is very large or even infinite. Suppose further that we can generate a sample <math>\xi^1,\xi^2,\dots,\xi^N</math> of <math>N</math> realizations of the random vector <math>\xi</math>. Usually the sample is assumed to be [[independent and identically distributed]] (i.i.d sample). Given a sample, the expectation function <math>q(x)=E[Q(x,\xi)]</math> is approximated by the sample average

<math>
\hat{q}_N(x) = \frac{1}{N} \sum_{j=1}^N Q(x,\xi^j)
</math>

and consequently the first-stage problem is given by

<math>
\begin{array}{rlrrr}
\hat{g}_N(x)=&\min\limits_{x\in \mathbb{R}^n}   & c^T x + \frac{1}{N} \sum_{j=1}^N Q(x,\xi^j)    &   \\
&\text{subject to} & Ax    &=&    b \\
&		    & x     &\geq& 0
\end{array}
</math>

This formulation is known as the ''Sample Average Approximation'' method. The SAA problem is a function of the considered sample and in that sense is random. For a given sample <math>\xi^1,\xi^2,\dots,\xi^N</math> the SAA problem is of the same form as a two-stage stochastic linear programming problem with the scenarios <math>\xi^j</math>., <math>j=1,\dots,N</math>, each taken with the same probability <math>p_j=\frac{1}{N}</math>.

=== Statistical inference ===

Consider the following stochastic programming problem

<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>
\min\limits_{x\in X}\{ g(x) = f(x)+E[Q(x,\xi)] \}
</math></div>

Here <math>X</math> is a nonempty closed subset of <math>\mathbb{R}^n</math>, <math>\xi</math> is a random vector whose probability distribution <math>P</math> is supported on a set <math>\Xi \subset \mathbb{R}^d</math>, and <math>Q: X \times \Xi \rightarrow \mathbb{R}</math>. In the framework of two-stage stochastic programming, <math>Q(x,\xi)</math> is given by the optimal value of the corresponding second-stage problem.

Assume that <math>g(x)</math> is well defined and ''finite valued'' for all <math>x\in X</math>. This implies that for every <math>x\in X</math> the value <math>Q(x,\xi)</math> is finite almost surely.

Suppose that we have a sample <math>\xi^1,\dots,\xi^N</math> of <math>N</math>realizations of the random vector <math>\xi</math>. This random sample can be viewed as historical data of <math>N</math> observations of <math>\xi</math>, or it can be generated by Monte Carlo sampling techniques. Then we can formulate a corresponding ''sample average approximation''

<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>
\min\limits_{x\in X}\{ \hat{g}_N(x) = f(x)+\frac{1}{N} \sum_{j=1}^N Q(x,\xi^j) \}
</math></div>

By the [[law of large numbers]] we have that, under some regularity conditions <math>\frac{1}{N} \sum_{j=1}^N Q(x,\xi^j)</math> converges pointwise with probability 1 to <math>E[Q(x,\xi)]</math> as <math>N \rightarrow \infty</math>. Moreover, under mild additional conditions the convergence is uniform. We also have <math>E[\hat{g}_N(x)]=g(x)</math>, i.e., <math>\hat{g}_N(x)</math> is an ''unbiased'' estimator of <math>g(x)</math>. Therefore, it is natural to expect that the optimal value and optimal solutions of the SAA problem converge to their counterparts of the true problem as <math>N \rightarrow \infty</math>.

====Consistency of SAA estimators====

Suppose the feasible set <math>X</math> of the SAA problem is fixed, i.e., it is independent of the sample. Let <math>\vartheta^*</math> and <math>S^*</math> be the optimal value and the set of optimal solutions, respectively, of the true problem and let <math>\hat{\vartheta}_N</math> and <math>\hat{S}_N</math> be the optimal value and the set of optimal solutions, respectively, of the SAA problem.

# Let <math>g: X \rightarrow \mathbb{R}</math> and <math>\hat{g}_N: X \rightarrow \mathbb{R}</math> be a sequence of (deterministic) real valued functions. The following two properties are equivalent:
#* for any <math>\overline{x}\in X</math> and any sequence <math>\{x_N\}\subset X</math> converging to <math>\overline{x}</math> it follows that <math>\hat{g}_N(x_N)</math> converges to <math>g(\overline{x})</math>
#* the function <math>g(\cdot)</math> is continuous on <math>X</math> and <math>\hat{g}_N(\cdot)</math> converges to <math>g(\cdot)</math> uniformly on any compact subset of <math>X</math>
# If the objective of the SAA problem <math>\hat{g}_N(x)</math> converges to the true problem's objective <math>g(x)</math> with probability 1, as <math>N \rightarrow \infty</math>, uniformly on the feasible set <math>X</math>. Then <math>\hat{\vartheta}_N</math> converges to <math>\vartheta^*</math> with probability 1 as <math>N \rightarrow \infty</math>.
# Suppose that there exists a compact set <math>C \subset \mathbb{R}^n</math> such that
#* the set <math>S</math> of optimal solutions of the true problem is nonempty and is contained in <math>C</math>
#* the function <math>g(x)</math> is finite valued and continuous on <math>C</math>
#* the sequence of functions <math>\hat{g}_N(x)</math> converges to <math>g(x)</math> with probability 1, as <math>N \rightarrow \infty</math>, uniformly in <math>x\in C</math>
#* for <math>N</math> large enough the set <math>\hat{S}_N</math> is nonempty and <math>\hat{S}_N \subset C</math> with probability 1
:: then <math>\hat{\vartheta}_N \rightarrow \vartheta^*</math> and <math>\mathbb{D}(S^*,\hat{S}_N)\rightarrow 0 </math> with probability 1 as <math>N\rightarrow \infty </math>. Note that <math>\mathbb{D}(A,B) </math> denotes the ''deviation of set <math>A</math> from set <math>B</math>'', defined as

<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>
\mathbb{D}(A,B) := \sup_{x\in A} \{ \inf_{x' \in B} \|x-x'\| \}
</math></div>

In some situations the feasible set <math>X</math> of the SAA problem is estimated, then the corresponding SAA problem takes the form

<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>
\min_{x\in X_N} \hat{g}_N(x)
</math></div>

where <math>X_N</math> is a subset of <math>\mathbb{R}^n</math> depending on the sample and therefore is random. Nevertheless, consistency results for SAA estimators can still be derived under some additional assumptions:
# Suppose that there exists a compact set <math>C \subset \mathbb{R}^n</math> such that
#* the set <math>S</math> of optimal solutions of the true problem is nonempty and is contained in <math>C</math>
#* the function <math>g(x)</math> is finite valued and continuous on <math>C</math>
#* the sequence of functions <math>\hat{g}_N(x)</math> converges to <math>g(x)</math> with probability 1, as <math>N \rightarrow \infty</math>, uniformly in <math>x\in C</math>
#* for <math>N</math> large enough the set <math>\hat{S}_N</math> is nonempty and <math>\hat{S}_N \subset C</math> with probability 1
#* if <math> x_N \in X_N</math> and <math> x_N </math> converges with probability 1 to a point <math> x</math>, then <math> x \in X</math>
#* for some point <math> x \in S^*</math> there exists a sequence <math> x_N \in X_N</math> such that <math> x_N \rightarrow x</math> with probability 1.
:: then <math>\hat{\vartheta}_N \rightarrow \vartheta^*</math> and <math>\mathbb{D}(S^*,\hat{S}_N)\rightarrow 0 </math> with probability 1 as <math>N\rightarrow \infty </math>.

==== Asymptotics of the SAA optimal value ====

Suppose the sample <math>\xi^1,\dots,\xi^N</math> is i.i.d. and fix a point <math>x \in X</math>. Then the sample average estimator <math>\hat{g}_N(x)</math>, of <math>g(x)</math>, is unbiased and has variance <math>\frac{1}{N}\sigma^2(x)</math>, where <math>\sigma^2(x):=Var[Q(x,\xi)]</math> is supposed to be finite. Moreover, by the [[central limit theorem]] we have that

<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>
\sqrt{N} [\hat{g}_N-  g(x)] \xrightarrow{\mathcal{D}} Y_x
</math></div>

where <math>\xrightarrow{\mathcal{D}}</math> denotes convergence in ''distribution'' and <math>Y_x</math> has a normal distribution with mean <math>0</math> and variance <math>\sigma^2(x)</math>, written as <math>\mathcal{N}(0,\sigma^2(x))</math>.

In other words, <math>\hat{g}_N(x)</math> has ''asymptotically normal'' distribution, i.e., for large <math>N</math>, <math>\hat{g}_N(x)</math> has approximately normal distribution with mean <math>g(x)</math> and variance <math>\frac{1}{N}\sigma^2(x)</math>. This leads to the following (approximate) <math>100(1-\alpha)</math>% confidence interval for <math>f(x)</math>:

<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"> <math>
\left[ \hat{g}_N(x)-z_{\alpha/2} \frac{\hat{\sigma}(x)}{\sqrt{N}}, \hat{g}_N(x)+z_{\alpha/2} \frac{\hat{\sigma}(x)}{\sqrt{N}}\right]
</math></div>

where <math>z_{\alpha/2}:=\Phi^{-1}(1-\alpha/2)</math> (here <math>\Phi(\cdot)</math> denotes the cdf of the standard normal distribution) and

<div class="center" style="width: auto; margin-left: auto; margin-right: auto;"><math>
\hat{\sigma}^2(x) := \frac{1}{N-1}\sum_{j=1}^{N} \left[ Q(x,\xi^j)-\frac{1}{N} \sum_{j=1}^N Q(x,\xi^j) \right]^2
</math></div>

is the sample variance estimate of <math>\sigma^2(x)</math>. That is, the error of estimation of <math>g(x)</math> is (stochastically) of order <math> O(\sqrt{N})</math>.