AI ALIGNMENT FORUM
AF

Towards Causal Foundations of Safe AGI

Jun 09, 2023 by tom4everitt

This sequence will give our take on how causality underpins many critical aspects of safe AGI, including agency, incentives, misspecification, generalisation, fairness, and corrigibility. We summarise past work and point to open questions. 

By the Causal Incentives Working Group

31Introduction to Towards Causal Foundations of Safe AGI
tom4everitt, Lewis Hammond, Francis Rhys Ward, RyanCarey, James Fox, mattmacdermott, sbenthall
2y
0
17Causality: A Brief Introduction
tom4everitt, Lewis Hammond, Jonathan Richens, Francis Rhys Ward, RyanCarey, sbenthall, James Fox
2y
5
13Agency from a causal perspective
tom4everitt, mattmacdermott, James Fox, Francis Rhys Ward, Jonathan Richens
2y
0
12Incentives from a causal perspective
tom4everitt, James Fox, RyanCarey, mattmacdermott, sbenthall, Jonathan Richens
2y
0
15Reward Hacking from a Causal Perspective
tom4everitt, Francis Rhys Ward, sbenthall, James Fox, mattmacdermott, RyanCarey
2y
2