Editing Superintelligence (section)

=== Recent developments ===

Since Bostrom's analysis, new approaches to AI value alignment have emerged:

* Inverse Reinforcement Learning (IRL) – This technique aims to infer human preferences from observed behavior, potentially offering a more robust approach to value alignment.<ref>{{Cite journal |last1=Christiano |first1=Paul |last2=Leike |first2=Jan |last3=Brown |first3=Tom B. |last4=Martic |first4=Miljan |last5=Legg |first5=Shane |last6=Amodei |first6=Dario |date=2017 |title=Deep Reinforcement Learning from Human Preferences |url=https://proceedings.neurips.cc/paper_files/paper/2017/file/d5e2c0adad503c91f91df240d0cd4e49-Paper.pdf |journal=NeurIPS |arxiv=1706.03741}}</ref>
* [[Constitutional AI]] – Proposed by Anthropic, this involves training AI systems with explicit ethical principles and constraints.<ref>{{Cite web |date=December 15, 2022 |title=Constitutional AI: Harmlessness from AI Feedback |url=https://www.anthropic.com/research/constitutional-ai-harmlessness-from-ai-feedback |access-date= |website=Anthropic |language=en}}</ref>
* Debate and amplification – These techniques, explored by OpenAI, use AI-assisted debate and iterative processes to better understand and align with human values.<ref>{{Cite web |date=October 22, 2018 |title=Learning complex goals with iterated amplification |url=https://openai.com/index/learning-complex-goals-with-iterated-amplification/ |website=OpenAI}}</ref>