x

AI ALIGNMENT FORUM

AF

GradientDissenter — AI Alignment Forum

GradientDissenter

Top postsTop post

GradientDissenter

Message

1323

Ω

86

11

47

4y

GradientDissenter

1323

Ω

86

4y

CoT May Be Highly Informative Despite “Unfaithfulness” [METR]

This is a link-post for METR's CoT May Be Highly Informative Despite “Unfaithfulness”. I recommend viewing the post on METR's website, since it contains interactive widgets. Recent work [1, 2, 3, 4, 5] demonstrates that LLMs’ chain of thoughts (CoTs)[1] aren’t always “faithful”: they don’t contain an accurate representation of...

Aug 11, 2025•64

METR's Evaluation of GPT-5

METR (where I work, though I'm cross-posting in a personal capacity) evaluated GPT-5 before it was externally deployed. We performed a much more comprehensive safety analysis than we ever have before; it feels like pre-deployment evals are getting more mature. This is the first time METR has produced something we've...

Aug 7, 2025•148