x
Agentic Interpretability: A Strategy Against Gradual Disempowerment — AI Alignment Forum