Corrigibility is also used in a broader sense, something like a helpful agent. Paul Christiano has defined corrigibility as an agent that will help me:
- Figure out whether I built the right AI and correct any mistakes I made
- Remain informed about the AI’s behavior and avoid unpleasant surprises
- Make better decisions and clarify my preferences
- Acquire resources and remain in effective control of them
- Ensure that my AI systems continue to do all of these nice things
- …and so on
Corrigibility is also used in a broader sense, something like a helpful agent. Paul Christiano has defined corrigibility as an agent that will help me: