x

AI ALIGNMENT FORUM
AF

dx26 — AI Alignment Forum

Dylan Xu

Dylan Xu

Message

92

Ω

28

3

10

4y

Dylan Xu

92

Ω

28

4y

Measuring Coherence of Policies in Toy Environments

This post was produced as part of the Astra Fellowship under the Winter 2024 Cohort, mentored by Richard Ngo. Thanks to Martín Soto, Jeremy Gillen, Daniel Kokotajlo, and Lukas Berglund for feedback. Summary Discussions around the likelihood and threat models of AI existential risk (x-risk) often hinge on some informal...

Mar 18, 2024•59