This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
AF
Login
60
Books of LessWrong
A Moderate Update to your Artificial Priors
95
ARC's first technical report: Eliciting Latent Knowledge
paulfchristiano
,
Mark Xu
,
Ajeya Cotra
4y
72
71
Fun with +12 OOMs of Compute
Daniel Kokotajlo
5y
45
120
What 2026 looks like
Daniel Kokotajlo
4y
33
87
Ngo and Yudkowsky on alignment difficulty
Eliezer Yudkowsky
,
Richard_Ngo
4y
53
74
Another (outer) alignment failure story
paulfchristiano
4y
25
93
What Multipolar Failure Looks Like, and Robust Agent-Agnostic Processes (RAAPs)
Andrew_Critch
5y
49
75
The Plan
johnswentworth
4y
19
54
Finite Factored Sets
Scott Garrabrant
4y
70
50
Selection Theorems: A Program For Understanding Agents
johnswentworth
4y
24
72
My research methodology
paulfchristiano
5y
35
61
larger language models may disappoint you [or, an eternally unfinished draft]
nostalgebraist
4y
7
56
Comments on Carlsmith's “Is power-seeking AI an existential risk?”
So8res
4y
11
64
EfficientZero: How It Works
1a3orn
4y
2
31
Specializing in Problems We Don't Understand
johnswentworth
5y
0