The Library

Curated Sequences

AGI safety from first principles
Embedded Agency
2022 MIRI Alignment Discussion
2021 MIRI Conversations
Conditioning Predictive Models
The Engineer’s Interpretability Sequence
Iterated Amplification
Value Learning
Risks from Learned Optimization
Cartesian Frames

Community Sequences

AI Control
Formalising Catastrophic Goodhart
The Ethicophysics
Game Theory without Argmax
The Value Change Problem (sequence)
Monthly Algorithmic Problems in Mech Interp
An Opinionated Guide to Computability and Complexity
Developmental Interpretability
Catastrophic Risks From AI
Distilling Singular Learning Theory
Towards Causal Foundations of Safe AGI
CAIS Philosophy Fellowship Midpoint Deliverables
Load More (12/73)