The Library

Curated Sequences

AGI safety from first principles
Embedded Agency
2022 MIRI Alignment Discussion
2021 MIRI Conversations
Infra-Bayesianism
Conditioning Predictive Models
Cyborgism
The Engineer’s Interpretability Sequence
Iterated Amplification
Value Learning
Risks from Learned Optimization
Cartesian Frames

Community Sequences

Formalising Catastrophic Goodhart
The Ethicophysics
Game Theory without Argmax
The Value Change Problem (sequence)
Monthly Algorithmic Problems in Mech Interp
An Opinionated Guide to Computability and Complexity
Developmental Interpretability
Catastrophic Risks From AI
Distilling Singular Learning Theory
Towards Causal Foundations of Safe AGI
CAIS Philosophy Fellowship Midpoint Deliverables
Interpreting a Maze-Solving Network
Load More (12/72)