Editing Superintelligence (section)

== Design considerations ==

The design of superintelligent AI systems raises critical questions about what values and goals these systems should have. Several proposals have been put forward:{{sfn|Bostrom|2014|pp=209–221}}

=== Value alignment proposals ===

* [[Coherent extrapolated volition]] (CEV) – The AI should have the values upon which humans would converge if they were more knowledgeable and rational.
* Moral rightness (MR) – The AI should be programmed to do what is morally right, relying on its superior cognitive abilities to determine ethical actions.
* Moral permissibility (MP) – The AI should stay within the bounds of moral permissibility while otherwise pursuing goals aligned with human values (similar to CEV).

Bostrom elaborates on these concepts:

<blockquote>instead of implementing humanity's coherent extrapolated volition, one could try to build an AI to do what is morally right, relying on the AI's superior cognitive capacities to figure out just which actions fit that description. We can call this proposal "moral rightness" (MR){{nbsp}}... 

MR would also appear to have some disadvantages. It relies on the notion of "morally right", a notoriously difficult concept, one with which philosophers have grappled since antiquity without yet attaining consensus as to its analysis. Picking an erroneous explication of "moral rightness" could result in outcomes that would be morally very wrong{{nbsp}}...

One might try to preserve the basic idea of the MR model while reducing its demandingness by focusing on ''moral permissibility'': the idea being that we could let the AI pursue humanity's CEV so long as it did not act in morally impermissible ways.{{sfn|Bostrom|2014|pp=209–221}}</blockquote>

=== Recent developments ===

Since Bostrom's analysis, new approaches to AI value alignment have emerged:

* Inverse Reinforcement Learning (IRL) – This technique aims to infer human preferences from observed behavior, potentially offering a more robust approach to value alignment.<ref>{{Cite journal |last1=Christiano |first1=Paul |last2=Leike |first2=Jan |last3=Brown |first3=Tom B. |last4=Martic |first4=Miljan |last5=Legg |first5=Shane |last6=Amodei |first6=Dario |date=2017 |title=Deep Reinforcement Learning from Human Preferences |url=https://proceedings.neurips.cc/paper_files/paper/2017/file/d5e2c0adad503c91f91df240d0cd4e49-Paper.pdf |journal=NeurIPS |arxiv=1706.03741}}</ref>
* [[Constitutional AI]] – Proposed by Anthropic, this involves training AI systems with explicit ethical principles and constraints.<ref>{{Cite web |date=December 15, 2022 |title=Constitutional AI: Harmlessness from AI Feedback |url=https://www.anthropic.com/research/constitutional-ai-harmlessness-from-ai-feedback |access-date= |website=Anthropic |language=en}}</ref>
* Debate and amplification – These techniques, explored by OpenAI, use AI-assisted debate and iterative processes to better understand and align with human values.<ref>{{Cite web |date=October 22, 2018 |title=Learning complex goals with iterated amplification |url=https://openai.com/index/learning-complex-goals-with-iterated-amplification/ |website=OpenAI}}</ref>

=== Transformer LLMs and ASI ===

The rapid advancement of transformer-based LLMs has led to speculation about their potential path to ASI. Some researchers argue that scaled-up versions of these models could exhibit ASI-like capabilities:<ref>{{Cite web |date=2021 |title=On the Opportunities and Risks of Foundation Models |url=https://crfm.stanford.edu/report.html |website=Stanford University |arxiv=2108.07258 |last1=Bommasani |first1=Rishi |last2=Hudson |first2=Drew A. |last3=Adeli |first3=Ehsan |last4=Altman |first4=Russ |last5=Arora |first5=Simran |author6=Sydney von Arx |last7=Bernstein |first7=Michael S. |last8=Bohg |first8=Jeannette |last9=Bosselut |first9=Antoine |last10=Brunskill |first10=Emma |last11=Brynjolfsson |first11=Erik |last12=Buch |first12=Shyamal |last13=Card |first13=Dallas |last14=Castellon |first14=Rodrigo |last15=Chatterji |first15=Niladri |last16=Chen |first16=Annie |last17=Creel |first17=Kathleen |author18=Jared Quincy Davis |last19=Demszky |first19=Dora |last20=Donahue |first20=Chris |last21=Doumbouya |first21=Moussa |last22=Durmus |first22=Esin |last23=Ermon |first23=Stefano |last24=Etchemendy |first24=John |last25=Ethayarajh |first25=Kawin |last26=Fei-Fei |first26=Li |last27=Finn |first27=Chelsea |last28=Gale |first28=Trevor |last29=Gillespie |first29=Lauren |last30=Goel |first30=Karan |display-authors=1 }}</ref>

* Emergent abilities – As LLMs increase in size and complexity, they demonstrate unexpected capabilities not present in smaller models.<ref name=":0">{{Cite journal |date=2022-06-26 |title=Emergent Abilities of Large Language Models |journal=Transactions on Machine Learning Research |language=en |arxiv=2206.07682 |issn=2835-8856 |last1=Wei |first1=Jason |last2=Tay |first2=Yi |last3=Bommasani |first3=Rishi |last4=Raffel |first4=Colin |last5=Zoph |first5=Barret |last6=Borgeaud |first6=Sebastian |last7=Yogatama |first7=Dani |last8=Bosma |first8=Maarten |last9=Zhou |first9=Denny |last10=Metzler |first10=Donald |last11=Chi |first11=Ed H. |last12=Hashimoto |first12=Tatsunori |last13=Vinyals |first13=Oriol |last14=Liang |first14=Percy |last15=Dean |first15=Jeff |last16=Fedus |first16=William }}</ref>
* In-context learning – LLMs show the ability to adapt to new tasks without fine-tuning, potentially mimicking general intelligence.<ref>{{Cite journal |date=2020 |title=Language Models are Few-Shot Learners |journal=NeurIPS |arxiv=2005.14165 |last1=Brown |first1=Tom B. |last2=Mann |first2=Benjamin |last3=Ryder |first3=Nick |last4=Subbiah |first4=Melanie |last5=Kaplan |first5=Jared |last6=Dhariwal |first6=Prafulla |last7=Neelakantan |first7=Arvind |last8=Shyam |first8=Pranav |last9=Sastry |first9=Girish |last10=Askell |first10=Amanda |last11=Agarwal |first11=Sandhini |last12=Herbert-Voss |first12=Ariel |last13=Krueger |first13=Gretchen |last14=Henighan |first14=Tom |last15=Child |first15=Rewon |last16=Ramesh |first16=Aditya |last17=Ziegler |first17=Daniel M. |last18=Wu |first18=Jeffrey |last19=Winter |first19=Clemens |last20=Hesse |first20=Christopher |last21=Chen |first21=Mark |last22=Sigler |first22=Eric |last23=Litwin |first23=Mateusz |last24=Gray |first24=Scott |last25=Chess |first25=Benjamin |last26=Clark |first26=Jack |last27=Berner |first27=Christopher |last28=McCandlish |first28=Sam |last29=Radford |first29=Alec |last30=Sutskever |first30=Ilya |display-authors=1 }}</ref>
* Multi-modal integration – Recent models can process and generate various types of data, including text, images, and audio.<ref>{{Cite journal |date=2022 |title=Flamingo: a Visual Language Model for Few-Shot Learning |journal=NeurIPS|arxiv=2204.14198 |last1=Alayrac |first1=Jean-Baptiste |last2=Donahue |first2=Jeff |last3=Luc |first3=Pauline |last4=Miech |first4=Antoine |last5=Barr |first5=Iain |last6=Hasson |first6=Yana |last7=Lenc |first7=Karel |last8=Mensch |first8=Arthur |last9=Millican |first9=Katie |last10=Reynolds |first10=Malcolm |last11=Ring |first11=Roman |last12=Rutherford |first12=Eliza |last13=Cabi |first13=Serkan |last14=Han |first14=Tengda |last15=Gong |first15=Zhitao |last16=Samangooei |first16=Sina |last17=Monteiro |first17=Marianne |last18=Menick |first18=Jacob |last19=Borgeaud |first19=Sebastian |last20=Brock |first20=Andrew |last21=Nematzadeh |first21=Aida |last22=Sharifzadeh |first22=Sahand |last23=Binkowski |first23=Mikolaj |last24=Barreira |first24=Ricardo |last25=Vinyals |first25=Oriol |last26=Zisserman |first26=Andrew |last27=Simonyan |first27=Karen }}</ref>

However, critics argue that current LLMs lack true understanding and are merely sophisticated pattern matchers, raising questions about their suitability as a path to ASI.<ref>{{Cite journal |last=Marcus |first=Gary |date=August 11, 2022 |title=Deep Learning Alone Isn't Getting Us To Human-Like AI |url=https://www.noemamag.com/deep-learning-alone-isnt-getting-us-to-human-like-ai/ |website=Noema}}</ref>

=== Other perspectives on artificial superintelligence ===

Additional viewpoints on the development and implications of superintelligence include:

* [[Recursive self-improvement]] – [[I. J. Good]] proposed the concept of an "intelligence explosion", where an AI system could rapidly improve its own intelligence, potentially leading to superintelligence.<ref>{{Cite news |date=March 30, 2017 |title=The AI apocalypse: will the human race soon be terminated? |url=https://www.irishtimes.com/business/innovation/the-ai-apocalypse-will-the-human-race-soon-be-terminated-1.3019220 |access-date= |newspaper=The Irish Times |language=en}}</ref>
* Orthogonality thesis – Bostrom argues that an AI's level of intelligence is orthogonal to its final goals, meaning a superintelligent AI could have any set of motivations.<ref>{{Cite journal |last=Bostrom |first=Nick |date=2012 |title=The Superintelligent Will: Motivation and Instrumental Rationality in Advanced Artificial Agents |url=https://nickbostrom.com/superintelligentwill.pdf |journal=Minds and Machines|volume=22 |issue=2 |pages=71–85 |doi=10.1007/s11023-012-9281-3 }}</ref>
* [[Instrumental convergence]] – Certain instrumental goals (e.g., self-preservation, resource acquisition) might be pursued by a wide range of AI systems, regardless of their final goals.<ref>{{Cite journal |last=Omohundro |first=Stephen M. |date=January 2008 |title=The basic AI drives |url=https://selfawaresystems.com/wp-content/uploads/2008/01/ai_drives_final.pdf |journal=Frontiers in Artificial Intelligence and Applications}}</ref>

=== Challenges and ongoing research ===

The pursuit of value-aligned AI faces several challenges:

* Philosophical uncertainty in defining concepts like "moral rightness"
* Technical complexity in translating ethical principles into precise algorithms
* Potential for unintended consequences even with well-intentioned approaches

Current research directions include multi-stakeholder approaches to incorporate diverse perspectives, developing methods for scalable oversight of AI systems, and improving techniques for robust value learning.<ref>{{Cite journal |last=Gabriel |first=Iason |date=2020-09-01 |title=Artificial Intelligence, Values, and Alignment |journal=Minds and Machines |language=en |volume=30 |issue=3 |pages=411–437 |doi=10.1007/s11023-020-09539-2 |issn=1572-8641|doi-access=free |arxiv=2001.09768 }}</ref>{{sfn|Russell|2019}}

Al research is rapidly progressing towards superintelligence. Addressing these design challenges remains crucial for creating ASI systems that are both powerful and aligned with human interests.