Post 4 of Towards Causal Foundations of Safe AGI, preceded by Post 1: Introduction, Post 2: Causality, Post 3: Agency, and Post 4: Incentives. By Francis Rhys Ward, Tom Everitt, Sebastian Benthall, James Fox, Matt MacDermott, Milad Kazemi, Ryan Carey representing the Causal Incentives Working Group. Thanks also to Toby...
Post 4 of Towards Causal Foundations of Safe AGI, preceded by Post 1: Introduction, Post 2: Causality, and Post 3: Agency. By Tom Everitt, James Fox, Ryan Carey, Matt MacDermott, Sebastian Benthall, and Jon Richens, representing the Causal Incentives Working Group. Thanks also to Toby Shevlane and Aliya Ahmad. “Show...
Post 2 of Towards Causal Foundations of Safe AGI, see also Post 1 Introduction. By Lewis Hammond, Tom Everitt, Jon Richens, Francis Rhys Ward, Ryan Carey, Sebastian Benthall, and James Fox, representing the Causal Incentives Working Group. Thanks also to Alexis Bellot, Toby Shevlane, and Aliya Ahmad. Causal models are...
By Tom Everitt, Lewis Hammond, Rhys Ward, Ryan Carey, James Fox, Sebastian Benthall, Matt MacDermott and Shreshth Malik representing the Causal Incentives Working Group. Thanks also to Toby Shevlane, MH Tessler, Aliya Ahmad, Zac Kenton, Maria Loks-Thompson, and Alexis Bellot. Over the next few years, society, organisations, and individuals will...
Abstract: Which variables does an agent have an incentive to control with its decision, and which variables does it have an incentive to respond to? We formalise these incentives, and demonstrate unique graphical criteria for detecting them in any single decision causal influence diagram. To this end, we introduce structural...
In 2015, Jessica introduced quantilization as a countermeasure for Goodhart's Law and specification-gaming. Since these are such central problems in AI safety, I consider quantilization to be one of the best innovations in AI safety so far, but it has received little attention from the AI safety field. I think...