Curated Sequences

Embedded Agency
AGI safety from first principles
Iterated Amplification
Value Learning
Risks from Learned Optimization
Cartesian Frames

Community Sequences

Thoughts on Corrigibility
Late 2021 MIRI Conversations
Epistemic Cookbook for Alignment
AI Safety Subprojects
Modeling Transformative AI Risk (MTAIR)
Practical Guide to Anthropics
The Causes of Power-seeking and Instrumental Convergence
Finite Factored Sets
Anthropic Decision Theory
Reviews for the Alignment Forum
Predictions & Self-awareness
Pointing at Normativity
Load More (12/32)