The Defender’s Advantage of Interpretability — AI Alignment Forum