x
On Meta-Level Adversarial Evaluations of (White-Box) Alignment Auditing — AI Alignment Forum