Editing Bellman equation (section)

=== Bellman's principle of optimality ===
The dynamic programming method breaks this decision problem into smaller subproblems. Bellman's ''principle of optimality'' describes how to do this:<blockquote>Principle of Optimality: An optimal policy has the property that whatever the initial state and initial decision are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision. (See Bellman, 1957, Chap. III.3.)<ref name=BellmanDP /><ref name=dreyfus /><ref name=BellmanTheory>{{cite journal |first=R |last=Bellman |pmc=1063639 |title=On the Theory of Dynamic Programming |journal=Proc Natl Acad Sci U S A |date=August 1952 |volume=38 |issue=8 |pages=716–9 |pmid=16589166 |doi=10.1073/pnas.38.8.716|bibcode=1952PNAS...38..716B |doi-access=free }}</ref></blockquote>
In computer science, a problem that can be broken apart like this is said to have [[optimal substructure]]. In the context of dynamic [[game theory]], this principle is analogous to the concept of [[subgame perfect equilibrium]], although what constitutes an optimal policy in this case is conditioned on the decision-maker's opponents choosing similarly optimal policies from their points of view.

As suggested by the ''principle of optimality'', we will consider the first decision separately, setting aside all future decisions (we will start afresh from time 1 with the new state <math>x_1 </math>). Collecting the future decisions in brackets on the right, the above infinite-horizon decision problem is equivalent to:{{Clarify|date=September 2017}}

:<math> \max_{ a_0 } \left \{ F(x_0,a_0)
+ \beta  \left[ \max_{ \left \{ a_{t} \right \}_{t=1}^{\infty} }
\sum_{t=1}^{\infty} \beta^{t-1} F(x_t,a_{t}):
a_{t} \in \Gamma (x_t), \; x_{t+1}=T(x_t,a_t), \; \forall t \geq 1 \right] \right \}</math>

subject to the constraints

:<math> a_0 \in \Gamma (x_0), \; x_1=T(x_0,a_0). </math>

Here we are choosing <math>a_0</math>, knowing that our choice will cause the time 1 state to be <math>x_1=T(x_0,a_0)</math>. That new state will then affect the decision problem from time 1 on. The whole future decision problem appears inside the square brackets on the right.{{Clarify|date=September 2017}}{{Explain|date=September 2017}}