Misalignment classifiers: Why they’re hard to evaluate adversarially, and why we're studying them anyway — AI Alignment Forum