AI ALIGNMENT FORUM
AF

Archive Recommendations

147AGI Ruin: A List of Lethalities
Eliezer Yudkowsky
3y
144
217Where I agree and disagree with Eliezer
Paul Christiano
3y
59
138SolidGoldMagikarp (plus, prompt generation)
Jessica Rumbelow, mwatkins
2y
17
189AI 2027: What Superintelligence Looks Like
Daniel Kokotajlo, Thomas Larsen, elifland, Scott Alexander, Jonas V, Romeo Dean
1mo
2
137Simulators
janus
3y
90
58The Waluigi Effect (mega-post)
Cleo Nardo
2y
26
117What 2026 looks like
Daniel Kokotajlo
4y
31
152Let’s think about slowing down AI
KatjaGrace
2y
3
192Alignment Faking in Large Language Models
Ryan Greenblatt, Evan Hubinger, Carson Denison, Benjamin Wright, Fabien Roger, Monte MacDiarmid, Sam Marks, Johannes Treutlein, Sam Bowman, Buck Shlegeris
4mo
24
121Steering GPT-2-XL by adding an activation vector
Alex Turner, Monte MacDiarmid, David Udell, lisathiergart, Ulisse Mini
2y
63