The Library

Curated Sequences

AGI safety from first principles
Embedded Agency
2022 MIRI Alignment Discussion
2021 MIRI Conversations
Infra-Bayesianism
Iterated Amplification
Value Learning
Risks from Learned Optimization
Cartesian Frames

Community Sequences

Towards Causal Foundations of Safe AGI
CAIS Philosophy Fellowship Midpoint Deliverables
Cyborgism
Interpreting a Maze-Solving Network
From Atoms To Agents
Interpreting Othello-GPT
Leveling Up: advice & resources for junior alignment researchers
The Engineer’s Interpretability Sequence
Conditioning Predictive Models
Simulator seminar sequence
Alignment Stream of Thought
Some comments on the CAIS paradigm
Load More (12/64)