AI ALIGNMENT FORUMTags
AF

Corrigibility

EditHistorySubscribe
Discussion (0)
Help improve this page (2 flags)
EditHistorySubscribe
Discussion (0)
Help improve this page (2 flags)
Corrigibility
Random Tag
Contributors
2Ben Pace

A corrigible agent is one that doesn't interfere with what we would intuitively see as attempts to 'correct' the agent, or 'correct' our mistakes in building it; and permits these 'corrections' despite the apparent instrumentally convergent reasoning saying otherwise.

Posts tagged Corrigibility
Most Relevant
3
22Corrigibility
Paul Christiano
4y
3
1
29Let's See You Write That Corrigibility Tag
Eliezer Yudkowsky
9mo
22
1
16A Gym Gridworld Environment for the Treacherous Turn
Michaël Trazzi
5y
0
2
9Corrigibility Via Thought-Process Deference
Thane Ruthenis
4mo
5
1
47AI Alignment 2018-19 Review
Rohin Shah
3y
6
1
52Interpretability/Tool-ness/Alignment/Corrigibility are not Composable
johnswentworth
7mo
4
1
39Reward Is Not Enough
Steve Byrnes
2y
9
1
40A broad basin of attraction around human values?
Wei Dai
1y
10
1
31Non-Obstruction: A Simple Concept Motivating Corrigibility
Alex Turner
2y
18
2
33Corrigibility Can Be VNM-Incoherent
Alex Turner
1y
15
2
28Consequentialism & corrigibility
Steve Byrnes
1y
8
1
27Solving the whole AGI control problem, version 0.0001
Steve Byrnes
2y
2
1
33You can still fetch the coffee today if you're dead tomorrow
davidad (David A. Dalrymple)
3mo
8
1
23Three mental images from thinking about AGI debate & corrigibility
Steve Byrnes
3y
32
1
20Towards a mechanistic understanding of corrigibility
Evan Hubinger
4y
25
Load More (15/52)
Add Posts