This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
AF
Login
Archive Recommendations
146
AGI Ruin: A List of Lethalities
Eliezer Yudkowsky
3y
144
218
Where I agree and disagree with Eliezer
Paul Christiano
3y
59
142
SolidGoldMagikarp (plus, prompt generation)
Jessica Rumbelow
,
mwatkins
2y
17
63
The Waluigi Effect (mega-post)
Cleo Nardo
2y
26
130
Simulators
janus
2y
90
154
Let’s think about slowing down AI
KatjaGrace
2y
3
110
What 2026 looks like
Daniel Kokotajlo
3y
29
191
Alignment Faking in Large Language Models
Ryan Greenblatt
,
Evan Hubinger
,
Carson Denison
,
Benjamin Wright
,
Fabien Roger
,
Monte MacDiarmid
,
Sam Marks
,
Johannes Treutlein
,
Sam Bowman
,
Buck Shlegeris
17h
20
121
Steering GPT-2-XL by adding an activation vector
Alex Turner
,
Monte MacDiarmid
,
David Udell
,
lisathiergart
,
Ulisse Mini
2y
63
97
chinchilla's wild implications
nostalgebraist
2y
13