AI ALIGNMENT FORUM
AF

Counterfactual Planning

Feb 02, 2021 by Koen Holtman

Counterfactual planning is a design approach for creating a range of safety mechanisms that can be applied to AGI systems. This sequence introduces the graphical notation used in counterfactual planning, and it defines several safety mechanisms.

3Counterfactual Planning in AGI Systems
Koen Holtman
4y
0
3Graphical World Models, Counterfactuals, and Machine Learning Agents
Koen Holtman
4y
2
4Creating AGI Safety Interlocks
Koen Holtman
4y
4
9Disentangling Corrigibility: 2015-2021
Koen Holtman
4y
20
5Safely controlling the AGI agent reward function
Koen Holtman
4y
0