AI ALIGNMENT FORUM
AF

Towards Causal Foundations of Safe AGI

Jun 09, 2023 by Tom Everitt

This sequence will give our take on how causality underpins many critical aspects of safe AGI, including agency, incentives, misspecification, generalisation, fairness, and corrigibility. We summarise past work and point to open questions. 

By the Causal Incentives Working Group

29Introduction to Towards Causal Foundations of Safe AGI
Tom Everitt, Lewis Hammond, Francis Rhys Ward, Ryan Carey, James Fox, Matt MacDermott, Sebastian Benthall
2y
0
17Causality: A Brief Introduction
Tom Everitt, Lewis Hammond, Jonathan Richens, Francis Rhys Ward, Ryan Carey, Sebastian Benthall, James Fox
2y
5
13Agency from a causal perspective
Tom Everitt, Matt MacDermott, James Fox, Francis Rhys Ward, Jonathan Richens
2y
0
12Incentives from a causal perspective
Tom Everitt, James Fox, Ryan Carey, Matt MacDermott, Sebastian Benthall, Jonathan Richens
2y
0
15Reward Hacking from a Causal Perspective
Tom Everitt, Francis Rhys Ward, Sebastian Benthall, James Fox, Matt MacDermott, Ryan Carey
2y
2