Top postsTop post
GradientDissenter
1257
Ω
85
11
46
This is a link-post for METR's CoT May Be Highly Informative Despite “Unfaithfulness”. I recommend viewing the post on METR's website, since it contains interactive widgets. Recent work [1, 2, 3, 4, 5] demonstrates that LLMs’ chain of thoughts (CoTs)[1] aren’t always “faithful”: they don’t contain an accurate representation of...
METR (where I work, though I'm cross-posting in a personal capacity) evaluated GPT-5 before it was externally deployed. We performed a much more comprehensive safety analysis than we ever have before; it feels like pre-deployment evals are getting more mature. This is the first time METR has produced something we've...