AI ALIGNMENT FORUM
AF

Home Library Questions All Posts

The Library

Curated Sequences

AGI safety from first principles

Embedded Agency

by Abram Demski

2022 MIRI Alignment Discussion

by Rob Bensinger

2021 MIRI Conversations

by Rob Bensinger

Infra-Bayesianism

Conditioning Predictive Models

by Evan Hubinger

Cyborgism

The Engineer’s Interpretability Sequence

by Stephen Casper

Iterated Amplification

by Paul Christiano

Value Learning

Risks from Learned Optimization

by Evan Hubinger

Cartesian Frames

by Scott Garrabrant

Community Sequences

Create New Sequence

AI Control

by Fabien Roger

Formalising Catastrophic Goodhart

by Vojtech Kovarik

The Ethicophysics

Game Theory without Argmax

The Value Change Problem (sequence)

Monthly Algorithmic Problems in Mech Interp

by CallumMcDougall

An Opinionated Guide to Computability and Complexity

Developmental Interpretability

by Jesse Hoogland

Catastrophic Risks From AI

Distilling Singular Learning Theory

by Liam Carroll

Towards Causal Foundations of Safe AGI

CAIS Philosophy Fellowship Midpoint Deliverables

Load More (12/73)