Editing Markov decision process (section)

==Example==

[[File:Cartpole.gif|thumb|Pole Balancing example (rendering of the environment from the [[Open AI gym benchmark]])]]

An example of MDP is the Pole-Balancing model, which comes from classic control theory.

In this example, we have

* <math>S</math> is the set of ordered tuples <math>(\theta,\dot \theta, x, \dot x )\subset \mathbb R^4</math> given by pole angle, angular velocity, position of the cart and its speed.
* <math>A</math> is <math>\{-1,1\}</math>, corresponding to applying a force on the left (right) on the cart.
* <math>P_a(s, s')</math> is the transition of the system, which in this case is going to be deterministic and driven by the laws of mechanics.
*<math>R_a(s, s')</math> is <math>1</math> if the pole is up after the transition, zero otherwise. Therefore, this function only depend on <math>s'</math> in this specific case.