Editing Markov decision process (section)

==Alternative notations==
The terminology and notation for MDPs are not entirely settled. There are two main streams — one focuses on maximization problems from contexts like economics, using the terms action, reward, value, and calling the discount factor {{mvar|&beta;}} or {{mvar|&gamma;}}, while the other focuses on minimization problems from engineering and navigation{{Citation needed|date=December 2019}}, using the terms control, cost, cost-to-go, and calling the discount factor {{mvar|&alpha;}}. In addition, the notation for the transition probability varies.

{| class="wikitable"
! in this article !! alternative !! comment
|-
| action {{mvar|a}} || control {{mvar|u}} ||
|-
| reward {{mvar|R}} || cost {{mvar|g}}
| {{mvar|g}} is the negative of {{mvar|R}}
|-
| value {{mvar|V}} || cost-to-go {{mvar|J}}
| {{mvar|J}} is the negative of {{mvar|V}}
|-
| policy {{mvar|&pi;}} || policy {{mvar|&mu;}} ||
|-
| discounting factor {{mvar|&gamma;}} || discounting factor {{mvar|&alpha;}} ||
|-
| transition probability <math>P_a(s,s')</math> || transition probability <math>p_{ss'}(a)</math> ||
|}

In addition, transition probability is sometimes written <math>\Pr(s,a,s')</math>, <math>\Pr(s'\mid s,a)</math> or, rarely, <math>p_{s's}(a).</math>