It's hard to make scheming evals look realistic for LLMs — AI Alignment Forum