AI ALIGNMENT FORUMTags
AF

Corrigibility

EditHistorySubscribe
Discussion (0)
Help improve this page (2 flags)
EditHistorySubscribe
Discussion (0)
Help improve this page (2 flags)
Corrigibility
Random Tag
Contributors
2Ben Pace

A corrigible agent is one that doesn't interfere with what we would intuitively see as attempts to 'correct' the agent, or 'correct' our mistakes in building it; and permits these 'corrections' despite the apparent instrumentally convergent reasoning saying otherwise.

Posts tagged Corrigibility
Most Relevant
3
20Corrigibility
Paul Christiano
3y
2
1
16A Gym Gridworld Environment for the Treacherous Turn
Michaël Trazzi
4y
0
1
47AI Alignment 2018-19 Review
Rohin Shah
2y
6
1
40A broad basin of attraction around human values?
Wei Dai
1mo
9
1
33Reward Is Not Enough
Steve Byrnes
1y
9
1
29Non-Obstruction: A Simple Concept Motivating Corrigibility
Alex Turner
1y
18
2
51Corrigibility Can Be VNM-Incoherent
Alex Turner
6mo
15
2
28Consequentialism & corrigibility
Steve Byrnes
5mo
8
1
23Three mental images from thinking about AGI debate & corrigibility
Steve Byrnes
2y
32
1
27Solving the whole AGI control problem, version 0.0001
Steve Byrnes
1y
2
1
20Towards a mechanistic understanding of corrigibility
Evan Hubinger
3y
25
2
16Solve Corrigibility Week
Logan Riggs Smith
6mo
14
1
19Corrigibility as outside view
Alex Turner
2y
10
1
17Can corrigibility be learned safely?
Wei Dai
4y
15
1
13Do what we mean vs. do what we say
Rohin Shah
4y
2
Load More (15/37)
Add Posts