Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Reinforcement learning
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
=== Safe reinforcement learning === Safe reinforcement learning (SRL) can be defined as the process of learning policies that maximize the expectation of the return in problems in which it is important to ensure reasonable system performance and/or respect safety constraints during the learning and/or deployment processes.<ref>{{cite journal |last1=García |first1=Javier |last2=Fernández |first2=Fernando |title=A comprehensive survey on safe reinforcement learning |url=https://jmlr.org/papers/volume16/garcia15a/garcia15a.pdf |journal=The Journal of Machine Learning Research |date=1 January 2015 |volume=16 |issue=1 |pages=1437–1480 }}</ref> An alternative approach is risk-averse reinforcement learning, where instead of the ''expected'' return, a ''risk-measure'' of the return is optimized, such as the [[Expected shortfall|conditional value at risk]] (CVaR).<ref>{{Cite journal |last1=Dabney |first1=Will |last2=Ostrovski |first2=Georg |last3=Silver |first3=David |last4=Munos |first4=Remi |date=2018-07-03 |title=Implicit Quantile Networks for Distributional Reinforcement Learning |url=https://proceedings.mlr.press/v80/dabney18a.html |journal=Proceedings of the 35th International Conference on Machine Learning |language=en |publisher=PMLR |pages=1096–1105|arxiv=1806.06923 }}</ref> In addition to mitigating risk, the CVaR objective increases robustness to model uncertainties.<ref>{{Cite journal |last1=Chow |first1=Yinlam |last2=Tamar |first2=Aviv |last3=Mannor |first3=Shie |last4=Pavone |first4=Marco |date=2015 |title=Risk-Sensitive and Robust Decision-Making: a CVaR Optimization Approach |url=https://proceedings.neurips.cc/paper/2015/hash/64223ccf70bbb65a3a4aceac37e21016-Abstract.html |journal=Advances in Neural Information Processing Systems |publisher=Curran Associates, Inc. |volume=28|arxiv=1506.02188 }}</ref><ref>{{Cite web |title=Train Hard, Fight Easy: Robust Meta Reinforcement Learning |url=https://scholar.google.com/citations?view_op=view_citation&hl=en&user=LnwyFkkAAAAJ&citation_for_view=LnwyFkkAAAAJ:eQOLeE2rZwMC |access-date=2024-06-21 |website=scholar.google.com}}</ref> However, CVaR optimization in risk-averse RL requires special care, to prevent gradient bias<ref>{{Cite journal |last1=Tamar |first1=Aviv |last2=Glassner |first2=Yonatan |last3=Mannor |first3=Shie |date=2015-02-21 |title=Optimizing the CVaR via Sampling |url=https://ojs.aaai.org/index.php/AAAI/article/view/9561 |journal=Proceedings of the AAAI Conference on Artificial Intelligence |language=en |volume=29 |issue=1 |doi=10.1609/aaai.v29i1.9561 |issn=2374-3468|arxiv=1404.3862 }}</ref> and blindness to success.<ref>{{Cite journal |last1=Greenberg |first1=Ido |last2=Chow |first2=Yinlam |last3=Ghavamzadeh |first3=Mohammad |last4=Mannor |first4=Shie |date=2022-12-06 |title=Efficient Risk-Averse Reinforcement Learning |url=https://proceedings.neurips.cc/paper_files/paper/2022/hash/d2511dfb731fa336739782ba825cd98c-Abstract-Conference.html |journal=Advances in Neural Information Processing Systems |language=en |volume=35 |pages=32639–32652|arxiv=2205.05138 }}</ref>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)