This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
Books of LessWrong
AF
Login
A Moderate Update to your Artificial Priors
95
ARC's first technical report: Eliciting Latent Knowledge
Paul Christiano
,
Mark Xu
,
Ajeya Cotra
3y
72
70
Fun with +12 OOMs of Compute
Daniel Kokotajlo
4y
45
101
What 2026 looks like
Daniel Kokotajlo
3y
29
83
Ngo and Yudkowsky on alignment difficulty
Eliezer Yudkowsky
,
Richard Ngo
3y
53
75
Another (outer) alignment failure story
Paul Christiano
3y
25
92
What Multipolar Failure Looks Like, and Robust Agent-Agnostic Processes (RAAPs)
Andrew Critch
3y
49
74
The Plan
johnswentworth
3y
19
54
Finite Factored Sets
Scott Garrabrant
3y
70
49
Selection Theorems: A Program For Understanding Agents
johnswentworth
3y
24
72
My research methodology
Paul Christiano
3y
35
61
larger language models may disappoint you [or, an eternally unfinished draft]
nostalgebraist
3y
7
57
Comments on Carlsmith's “Is power-seeking AI an existential risk?”
Nate Soares
3y
11
64
EfficientZero: How It Works
1a3orn
3y
2
28
Specializing in Problems We Don't Understand
johnswentworth
3y
0