This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
AF
Login
Archive Recommendations
147
AGI Ruin: A List of Lethalities
Eliezer Yudkowsky
3y
144
217
Where I agree and disagree with Eliezer
paulfchristiano
3y
59
138
SolidGoldMagikarp (plus, prompt generation)
Jessica Rumbelow
,
mwatkins
3y
17
141
Simulators
janus
3y
90
188
AI 2027: What Superintelligence Looks Like
Daniel Kokotajlo
,
Thomas Larsen
,
elifland
,
Scott Alexander
,
Jonas V
,
romeo
5mo
2
58
The Waluigi Effect (mega-post)
Cleo Nardo
3y
26
120
What 2026 looks like
Daniel Kokotajlo
4y
33
152
Let’s think about slowing down AI
KatjaGrace
3y
3
192
Alignment Faking in Large Language Models
ryan_greenblatt
,
evhub
,
Carson Denison
,
Benjamin Wright
,
Fabien Roger
,
Monte M
,
Sam Marks
,
Johannes Treutlein
,
Sam Bowman
,
Buck
8mo
24
121
Steering GPT-2-XL by adding an activation vector
TurnTrout
,
Monte M
,
David Udell
,
lisathiergart
,
Ulisse Mini
2y
63