AI ALIGNMENT FORUMTags
AF

Corrigibility

•

Applied to The Shutdown Problem: Incomplete Preferences as a Solution by Elliott Thornley 2mo ago

•

Applied to Requirements for a Basin of Attraction to Alignment by Roger Dearnaley 2mo ago

•

Applied to Nash Bargaining between Subagents doesn't solve the Shutdown Problem by A.H. 3mo ago

•

Applied to Requirements for a STEM-capable AGI Value Learner (my Case for Less Doom) by Roger Dearnaley 3mo ago

•

Applied to A Pedagogical Guide to Corrigibility by A.H. 3mo ago

•

Applied to Predictive model agents are sort of corrigible by jacobjacob 3mo ago

•

Applied to Steering Llama-2 with contrastive activation additions by Alex Turner 4mo ago

•

Applied to A Question about Corrigibility (2015) by A.H. 5mo ago

Seth Herd v1.2.0Nov 27th 2023 (+472)

Corrigibility is also used in a broader sense, something like a helpful agent. Paul Christiano has defined corrigibility as an agent that will help me:

Figure out whether I built the right AI and correct any mistakes I made
Remain informed about the AI’s behavior and avoid unpleasant surprises
Make better decisions and clarify my preferences
Acquire resources and remain in effective control of them
Ensure that my AI systems continue to do all of these nice things
…and so on

•

Applied to Game Theory without Argmax [Part 2] by Cleo Nardo 5mo ago

•

Applied to Game Theory without Argmax [Part 1] by Cleo Nardo 5mo ago

•

Applied to GPT-2 XL's capacity for coherence and ontology clustering by Miguel de Guzman 6mo ago

•

Applied to The Shutdown Problem: An AI Engineering Puzzle for Decision Theorists by Elliott Thornley 6mo ago

•

Applied to What's Hard About The Shutdown Problem by RobertM 6mo ago

•

Applied to Relevance of 'Harmful Intelligence' Data in Training Datasets (WebText vs. Pile) by Miguel de Guzman 6mo ago

•

Applied to How useful is Corrigibility? by martinkunev 7mo ago

•

Applied to Instrumental Convergence Bounty by Logan Zoellner 8mo ago