AI ALIGNMENT FORUM
AF

Archive Recommendations

147AGI Ruin: A List of Lethalities
Eliezer Yudkowsky
3y
144
217Where I agree and disagree with Eliezer
paulfchristiano
3y
59
138SolidGoldMagikarp (plus, prompt generation)
Jessica Rumbelow, mwatkins
3y
17
141Simulators
janus
3y
90
188AI 2027: What Superintelligence Looks Like
Daniel Kokotajlo, Thomas Larsen, elifland, Scott Alexander, Jonas V, romeo
5mo
2
58The Waluigi Effect (mega-post)
Cleo Nardo
3y
26
120What 2026 looks like
Daniel Kokotajlo
4y
33
152Let’s think about slowing down AI
KatjaGrace
3y
3
192Alignment Faking in Large Language Models
ryan_greenblatt, evhub, Carson Denison, Benjamin Wright, Fabien Roger, Monte M, Sam Marks, Johannes Treutlein, Sam Bowman, Buck
8mo
24
121Steering GPT-2-XL by adding an activation vector
TurnTrout, Monte M, David Udell, lisathiergart, Ulisse Mini
2y
63