Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Bellman equation
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
=== Bellman's principle of optimality === The dynamic programming method breaks this decision problem into smaller subproblems. Bellman's ''principle of optimality'' describes how to do this:<blockquote>Principle of Optimality: An optimal policy has the property that whatever the initial state and initial decision are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision. (See Bellman, 1957, Chap. III.3.)<ref name=BellmanDP /><ref name=dreyfus /><ref name=BellmanTheory>{{cite journal |first=R |last=Bellman |pmc=1063639 |title=On the Theory of Dynamic Programming |journal=Proc Natl Acad Sci U S A |date=August 1952 |volume=38 |issue=8 |pages=716β9 |pmid=16589166 |doi=10.1073/pnas.38.8.716|bibcode=1952PNAS...38..716B |doi-access=free }}</ref></blockquote> In computer science, a problem that can be broken apart like this is said to have [[optimal substructure]]. In the context of dynamic [[game theory]], this principle is analogous to the concept of [[subgame perfect equilibrium]], although what constitutes an optimal policy in this case is conditioned on the decision-maker's opponents choosing similarly optimal policies from their points of view. As suggested by the ''principle of optimality'', we will consider the first decision separately, setting aside all future decisions (we will start afresh from time 1 with the new state <math>x_1 </math>). Collecting the future decisions in brackets on the right, the above infinite-horizon decision problem is equivalent to:{{Clarify|date=September 2017}} :<math> \max_{ a_0 } \left \{ F(x_0,a_0) + \beta \left[ \max_{ \left \{ a_{t} \right \}_{t=1}^{\infty} } \sum_{t=1}^{\infty} \beta^{t-1} F(x_t,a_{t}): a_{t} \in \Gamma (x_t), \; x_{t+1}=T(x_t,a_t), \; \forall t \geq 1 \right] \right \}</math> subject to the constraints :<math> a_0 \in \Gamma (x_0), \; x_1=T(x_0,a_0). </math> Here we are choosing <math>a_0</math>, knowing that our choice will cause the time 1 state to be <math>x_1=T(x_0,a_0)</math>. That new state will then affect the decision problem from time 1 on. The whole future decision problem appears inside the square brackets on the right.{{Clarify|date=September 2017}}{{Explain|date=September 2017}}
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)