Elliott Thornley (EJT) — AI Alignment Forum

The Shutdown Problem: Incomplete Preferences as a Solution

Preamble This post is an updated explanation of the POST-Agents Proposal (PAP): my proposed solution to the shutdown problem.[1] The post is shorter than my AI Alignment Awards contest entry but it’s still pretty long. The core of the idea is the Timestep Dominance Principle in section 11. That section...

Feb 23, 202462

The Shutdown Problem: An AI Engineering Puzzle for Decision Theorists

[NOTE: This paper was previously titled 'The Shutdown Problem: Three Theorems'.] This paper is an updated version of the first half of my AI Alignment Awards contest entry. My theorems build on the theorems of Soares, Fallenstein, Yudkowsky, and Armstrong in various ways.[1] These theorems can guide our search for...

Oct 23, 202379

There are no coherence theorems

[Written by EJT as part of the CAIS Philosophy Fellowship. Thanks to Dan for help posting to the Alignment Forum] Introduction For about fifteen years, the AI safety community has been discussing coherence arguments. In papers and posts on the subject, it’s often written that there exist 'coherence theorems' which...

Feb 20, 2023158