x
AI Alignment Forum
This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
AF
Login
Home
Library
Questions
All Posts
About
AI Alignment Posts
Popular Comments
15
My most common research advice: do quick sanity checks
LawrenceC
1d
2
10
Predicting When RL Training Breaks Chain-of-Thought Monitorability
David Lindner
2d
0
40
(Some) Natural Emergent Misalignment from Reward Hacking in Non-Production RL
7vik
,
Sid Black
,
Joseph Bloom
4d
0
21
ControlAI 2025 Impact Report: our progress toward an international ban on ASI
Andrea_Miotti
,
Alex Amadori
7d
0
20
Test your best methods on our hard CoT interp tasks
daria
,
Riya Tyagi
,
Josh Engels
,
Neel Nanda
8d
0
25
A Toy Environment For Exploring Reasoning About Reward
jenny
,
Bronson Schoen
9d
1
35
Metagaming matters for training, evaluation, and oversight
jenny
,
Bronson Schoen
16d
0
22
“Act-based approval-directed agents”, for IDA skeptics
Steven Byrnes
16d
6
6
New RFP on Interpretability from Schmidt Sciences
Peter Hase
17d
0
5
Power Steering: Behavior Steering via Layer-to-Layer Jacobian Singular Vectors
Omar Ayyub
21d
0