Seth Herd | v1.2.0Nov 27th 2023 | (+472) | ||
Yaakov T | v1.1.0Mar 27th 2023 | (+761/-10) | ||
Ben Pace | v1.0.0Jul 17th 2020 | (+268) |
A corrigible'corrigible' agent is one that doesn't interfere with what we would intuitively see as attempts to 'correct' the agent, or 'correct' our mistakes in building it; and permits these 'corrections' despite the apparent instrumentally convergent reasoning saying otherwise.
See also:
A corrigible agent is one that doesn't interfere with what we would intuitively see as attempts to 'correct' the agent, or 'correct' our mistakes in building it; and permits these 'corrections' despite the apparent instrumentally convergent reasoning saying otherwise.
Corrigibility is also used in a broader sense, something like a helpful agent. Paul Christiano has defined corrigibility as an agent that will help me: