x

AI ALIGNMENT FORUM

AF

Vincent Cheng

Vincent Cheng

Message

https://vvvincent.me/

66

Ω

24

1

1y

Vincent Cheng

https://vvvincent.me/

Vincent Cheng — AI Alignment Forum

Claude, GPT, and Gemini All Struggle to Evade Monitors

This work was done while at METR. Introduction GDM recently released a paper (Emmons et al.) showing that, contrary to previous results, the chain-of-thought (CoT) of language models is more faithful when the model’s CoT is necessary for it to complete a task. They examine three settings where an actor...

Aug 6, 2025•61