Editing Stochastic programming (section)

== Applications and examples==
=== Biological applications ===
[[Stochastic dynamic programming]] is frequently used to model [[ethology|animal behaviour]] in such fields as [[behavioural ecology]].<ref>Mangel, M. & Clark, C. W. 1988. ''Dynamic modeling in behavioral ecology.'' Princeton University Press {{ISBN|0-691-08506-4}}</ref><ref>Houston, A. I & McNamara, J. M. 1999. ''Models of adaptive behaviour: an approach based on state''. Cambridge University Press {{ISBN|0-521-65539-0}}</ref> Empirical tests of models of [[Optimal foraging theory|optimal foraging]], [[Biological life cycle|life-history]] transitions such as [[Fledge|fledging in birds]] and egg laying in [[parasitoid]] wasps have shown the value of this modelling technique in explaining the evolution of behavioural decision making. These models are typically many-staged, rather than two-staged.

===Economic applications===
[[Stochastic dynamic programming]] is a useful tool in understanding decision making under uncertainty. The accumulation of capital stock under uncertainty is one example; often it is used by resource economists to analyze [[Nicholas Georgescu-Roegen#Man.27s economic struggle and the social evolution of mankind .28bioeconomics.29|bioeconomic problems]]<ref>Howitt, R., Msangi, S., Reynaud, A and K. Knapp. 2002. [http://www.agecon.ucdavis.edu/aredepart/facultydocs/Howitt/Polyapprox3a.pdf "Using Polynomial Approximations to Solve Stochastic Dynamic Programming Problems: or A "Betty Crocker " Approach to SDP."]  University of California, Davis, Department of Agricultural and Resource Economics Working Paper.</ref> where the uncertainty enters in such as weather, etc.

=== Example: multistage portfolio optimization ===
{{Main|Intertemporal portfolio choice}}
{{See also|Merton's portfolio problem}}

The following is an example from finance of multi-stage stochastic programming.
Suppose that at time <math>t=0</math> we have initial capital <math>W_0</math> to invest in <math>n</math> assets. Suppose further that we are allowed to rebalance our portfolio at times <math>t=1,\dots,T-1</math> but without injecting additional cash into it. At each period <math>t</math> we make a decision about redistributing the current wealth <math>W_t</math> among the <math>n</math> assets. Let <math>x_0=(x_{10},\dots,x_{n0})</math> be the initial amounts invested in the n assets. We require that each <math>x_{i0}</math> is nonnegative and that the balance equation <math>\sum_{i=1}^{n}x_{i0}=W_0</math> should hold.

Consider the total returns <math>\xi_t=(\xi_{1t},\dots,\xi_{nt})</math> for each period <math>t=1,\dots,T</math>. This forms a vector-valued random process <math>\xi_1,\dots,\xi_T</math>. At time period <math>t=1</math>, we can rebalance the portfolio by specifying the amounts <math>x_1=(x_{11},\dots,x_{n1})</math> invested in the respective assets. At that time the returns in the first period have been realized so it is reasonable to use this information in the rebalancing decision. Thus, the second-stage decisions, at time <math>t=1</math>, are actually functions of realization of the random vector <math>\xi_1</math>, i.e., <math>x_1=x_1(\xi_1)</math>. Similarly, at time <math>t</math> the decision <math>x_t=(x_{1t},\dots,x_{nt})</math> is a function <math>x_t=x_t(\xi_{[t]})</math> of the available information given by <math>\xi_{[t]}=(\xi_{1},\dots,\xi_{t})</math> the history of the random process up to time <math>t</math>. A sequence of functions <math>x_t=x_t(\xi_{[t]})</math>, <math>t=0,\dots,T-1</math>, with <math>x_0</math> being constant, defines an ''implementable policy'' of the decision process. It is said that such a policy is ''feasible'' if it satisfies the model constraints with probability 1, i.e., the nonnegativity constraints <math>x_{it}(\xi_{[t]})\geq 0</math>, <math>i=1,\dots,n</math>, <math>t=0,\dots,T-1</math>, and the balance of wealth constraints,

:<math>
\sum_{i=1}^{n}x_{it}(\xi_{[t]}) = W_t,
</math>

where in period <math>t=1,\dots,T</math> the wealth <math>W_t</math> is given by

:<math>
W_t = \sum_{i=1}^{n}\xi_{it} x_{i,t-1}(\xi_{[t-1]}),
</math>

which depends on the realization of the random process and the decisions up to time <math>t</math>.

Suppose the objective is to maximize the expected utility of this wealth at the last period, that is, to consider the problem

:<math>
\max E[U(W_T)].
</math>

This is a multistage stochastic programming problem, where stages are numbered from <math>t=0</math> to <math>t=T-1</math>. Optimization is performed over all implementable and feasible policies. To complete the problem description one also needs to define the probability distribution of the random process <math>\xi_1,\dots,\xi_T</math>. This can be done in various ways. For example, one can construct a particular scenario tree defining time evolution of the process. If at every stage the random return of each asset is allowed to have two continuations, independent of other assets, then the total number of scenarios is <math>2^{nT}.</math>

In order to write [[dynamic programming]] equations, consider the above multistage problem backward in time. At the last stage <math>t=T-1</math>, a realization <math>\xi_{[T-1]}=(\xi_{1},\dots,\xi_{T-1})</math> of the random process is known and <math>x_{T-2}</math> has been chosen. Therefore, one needs to solve the following problem

:<math>
\begin{array}{lrclr}
\max\limits_{x_{T-1}}   & E[U(W_T)|\xi_{[T-1]}]    &   \\
\text{subject to} & W_T   &=&    \sum_{i=1}^{n}\xi_{iT}x_{i,T-1} \\
                    &\sum_{i=1}^{n}x_{i,T-1}&=&W_{T-1}\\
		    & x_{T-1}     &\geq& 0
\end{array}
</math>

where <math>E[U(W_T)|\xi_{[T-1]}]</math> denotes the conditional expectation of <math>U(W_T)</math> given <math>\xi_{[T-1]}</math>. The optimal value of the above problem depends on <math>W_{T-1}</math> and <math>\xi_{[T-1]}</math> and is denoted <math>Q_{T-1}(W_{T-1},\xi_{[T-1]})</math>.

Similarly, at stages <math>t=T-2,\dots,1</math>, one should solve the problem

:<math>
\begin{array}{lrclr}
\max\limits_{x_{t}}   & E[Q_{t+1}(W_{t+1},\xi_{[t+1]})|\xi_{[t]}]    &   \\
\text{subject to} & W_{t+1}   &=&    \sum_{i=1}^{n}\xi_{i,t+1}x_{i,t} \\
                    &\sum_{i=1}^{n}x_{i,t}&=&W_{t}\\
		    & x_{t}     &\geq& 0
\end{array}
</math>

whose optimal value is denoted by <math>Q_{t}(W_{t},\xi_{[t]})</math>. Finally, at stage <math>t=0</math>, one solves the problem

:<math>
\begin{array}{lrclr}
\max\limits_{x_{0}}   & E[Q_{1}(W_{1},\xi_{[1]})]    &   \\
\text{subject to} & W_{1}   &=&    \sum_{i=1}^{n}\xi_{i,1}x_{i0} \\
                    &\sum_{i=1}^{n}x_{i0}&=&W_{0}\\
		    & x_{0}     &\geq& 0
\end{array}
</math>

==== Stagewise independent random process ====

For a general distribution of the process <math>\xi_t</math>, it may be hard to solve these dynamic programming equations. The situation simplifies dramatically if the process <math>\xi_t</math> is stagewise independent, i.e., <math>\xi_t</math> is (stochastically) independent of <math>\xi_1,\dots,\xi_{t-1}</math> for <math>t=2,\dots,T</math>. In this case, the corresponding conditional expectations become unconditional expectations, and the function <math>Q_t(W_t)</math>, <math>t=1,\dots,T-1</math> does not depend on <math>\xi_{[t]}</math>. That is, <math>Q_{T-1}(W_{T-1})</math> is the optimal value of the problem

:<math>
\begin{array}{lrclr}
\max\limits_{x_{T-1}}   & E[U(W_T)]    &   \\
\text{subject to} & W_T   &=&    \sum_{i=1}^{n}\xi_{iT}x_{i,T-1} \\
                    &\sum_{i=1}^{n}x_{i,T-1}&=&W_{T-1}\\
		    & x_{T-1}     &\geq& 0
\end{array}
</math>

and <math>Q_t(W_t)</math> is the optimal value of

:<math>
\begin{array}{lrclr}
\max\limits_{x_{t}}   & E[Q_{t+1}(W_{t+1})]    &   \\
\text{subject to} & W_{t+1}   &=&    \sum_{i=1}^{n}\xi_{i,t+1}x_{i,t} \\
                    &\sum_{i=1}^{n}x_{i,t}&=&W_{t}\\
		    & x_{t}     &\geq& 0
\end{array}
</math>

for <math>t=T-2,\dots,1</math>.