Post 4 of Towards Causal Foundations of Safe AGI, preceded by Post 1: Introduction, Post 2: Causality, and Post 3: Agency.
By Tom Everitt, James Fox, Ryan Carey, Matt MacDermott, Sebastian Benthall, and Jon Richens, representing the Causal Incentives Working Group. Thanks also to Toby Shevlane and Aliya Ahmad.
“Show me the incentive, and I’ll show you the outcome” – Charlie Munger
Predicting behaviour is an important question when designing and deploying agentic AI systems. Incentives capture some key forces that shape agent behaviour,[1] which don’t require us to fully understand the internal workings of a system.
This post shows how a causal model of an agent and its environment can reveal what the agent wants to know and what it wants to control, as well as how it will respond to commands and influence its...
Post 3 of Towards Causal Foundations of Safe AGI, preceded by Post 1: Introduction and Post 2: Causality.
By Matt MacDermott, James Fox, Rhys Ward, Jonathan Richens, and Tom Everitt representing the Causal Incentives Working Group. Thanks also to Ryan Carey, Toby Shevlane, and Aliya Ahmad.
The purpose of this post is twofold: to lay the foundation for subsequent posts by exploring what agency means from a causal perspective, and to sketch a research program for a deeper understanding of agency.
Agency is a complex concept that has been studied from multiple perspectives, including social science, philosophy, and AI research. Broadly it refers to a system able to act autonomously. For the purposes of this blog post, we interpret agency as goal-directedness, i.e. acting as if trying to direct the world in some particular...
Post 2 of Towards Causal Foundations of Safe AGI, see also Post 1 Introduction.
By Lewis Hammond, Tom Everitt, Jon Richens, Francis Rhys Ward, Ryan Carey, Sebastian Benthall, and James Fox, representing the Causal Incentives Working Group. Thanks also to Alexis Bellot, Toby Shevlane, and Aliya Ahmad.
Causal models are the foundations of our work. In this post, we provide a succinct but accessible explanation of causal models that can handle interventions, counterfactuals, and agents, which will be the building blocks of future posts in the sequence. Basic familiarity with (conditional) probabilities will be assumed.
What does it mean for the rain to cause the grass to become green? Causality is a philosophically intriguing topic that underlies many other concepts of human importance. In particular, many concepts relevant to safe AGI, like...