Editing Q-learning (section)

=== Multi-agent learning ===
Q-learning has been proposed in the multi-agent setting (see Section 4.1.2 in <ref>{{cite journal |last1=Shoham |first1=Yoav |last2=Powers |first2=Rob |last3=Grenager |first3=Trond |title=If multi-agent learning is the answer, what is the question? |journal=Artificial Intelligence |date=1 May 2007 |volume=171 |issue=7 |pages=365–377 |doi=10.1016/j.artint.2006.02.006 |url=https://dl.acm.org/doi/10.1016/j.artint.2006.02.006 |access-date=4 April 2023 |issn=0004-3702}}</ref>). One approach consists in pretending the environment is passive.<ref>{{cite journal |last1=Sen |first1=Sandip |last2=Sekaran |first2=Mahendra |last3=Hale |first3=John |title=Learning to coordinate without sharing information |journal=Proceedings of the Twelfth AAAI National Conference on Artificial Intelligence |date=1 August 1994 |pages=426–431 |url=https://dl.acm.org/doi/10.5555/2891730.2891796 |access-date=4 April 2023 |publisher=AAAI Press}}</ref> Littman proposes the minimax Q learning algorithm.<ref>{{cite journal |last1=Littman |first1=Michael L. |title=Markov games as a framework for multi-agent reinforcement learning |journal=Proceedings of the Eleventh International Conference on International Conference on Machine Learning |date=10 July 1994 |pages=157–163 |url=https://dl.acm.org/doi/10.5555/3091574.3091594 |access-date=4 April 2023 |publisher=Morgan Kaufmann Publishers Inc.|isbn=9781558603356 }}</ref>