Carson Denison

I work on deceptive alignment and reward hacking at Anthropic

Wiki Contributions

Comments

Thank you for catching this. 

These linked to section titles in our draft gdoc for this post. I have replaced them with mentions of the appropriate sections in this post.