Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Markov decision process
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
=== Learning automata === {{main|Learning automata}} Another application of MDP process in [[machine learning]] theory is called learning automata. This is also one type of reinforcement learning if the environment is stochastic. The first detail '''learning automata''' paper is surveyed by [[Kumpati S. Narendra|Narendra]] and Thathachar (1974), which were originally described explicitly as [[finite-state automata]].<ref>{{Cite journal|last1=Narendra|first1=K. S.|author-link=Kumpati S. Narendra|last2=Thathachar|first2=M. A. L.|year=1974 |title=Learning Automata – A Survey|journal=IEEE Transactions on Systems, Man, and Cybernetics|volume=SMC-4|issue=4|pages=323–334|doi=10.1109/TSMC.1974.5408453|issn=0018-9472|citeseerx=10.1.1.295.2280}}</ref> Similar to reinforcement learning, a learning automata algorithm also has the advantage of solving the problem when probability or rewards are unknown. The difference between learning automata and Q-learning is that the former technique omits the memory of Q-values, but updates the action probability directly to find the learning result. Learning automata is a learning scheme with a rigorous proof of convergence.<ref name="NarendraEtAl1989">{{Cite book|url=https://archive.org/details/learningautomata00nare|url-access=registration|title=Learning automata: An introduction|last1=Narendra|first1=Kumpati S.|author-link=Kumpati S. Narendra|last2=Thathachar|first2=Mandayam A. L.|year=1989|publisher=Prentice Hall|isbn=9780134855585|language=en}}</ref> In learning automata theory, '''a stochastic automaton''' consists of: * a set ''x'' of possible inputs, * a set Φ = { Φ<sub>1</sub>, ..., Φ<sub>''s''</sub> } of possible internal states, * a set α = { α<sub>1</sub>, ..., α<sub>''r''</sub> } of possible outputs, or actions, with ''r'' ≤ ''s'', * an initial state probability vector ''p''(0) = ≪ ''p''<sub>1</sub>(0), ..., ''p<sub>s</sub>''(0) ≫, * a [[computable function]] ''A'' which after each time step ''t'' generates ''p''(''t'' + 1) from ''p''(''t''), the current input, and the current state, and * a function ''G'': Φ → α which generates the output at each time step. The states of such an automaton correspond to the states of a "discrete-state discrete-parameter [[Markov process]]".{{sfn|Narendra|Thathachar|1974|loc=p.325 left}} At each time step ''t'' = 0,1,2,3,..., the automaton reads an input from its environment, updates P(''t'') to P(''t'' + 1) by ''A'', randomly chooses a successor state according to the probabilities P(''t'' + 1) and outputs the corresponding action. The automaton's environment, in turn, reads the action and sends the next input to the automaton.<ref name="NarendraEtAl1989" />
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)