x
The safe-to-dangerous shift is a fundamental problem for eval realism; but also for measuring awareness — AI Alignment Forum