Try training token-level probes — AI Alignment Forum