x

AI ALIGNMENT FORUM

AF

Pablo Bernabeu-Pérez — AI Alignment Forum

Pablo Bernabeu-Pérez

Pablo Bernabeu-Pérez

Message

70

2y

Pablo Bernabeu-Pérez

70

2y

Unfaithful Reasoning Can Fool Chain-of-Thought Monitoring

by Benjamin Arnav, Pablo Bernabeu-Pérez, Tim Kostolansky, HanneWhitt, Nathan Helm-Burger, and Mary Phuong

This research was completed for LASR Labs 2025 by Benjamin Arnav, Pablo Bernabeu-Pérez, Nathan Helm-Burger, Tim Kostolansky and Hannes Whittingham. The team was supervised by Mary Phuong. Find out more about the program and express interest in upcoming iterations here. Read the full paper: "CoT Red-Handed: Stress Testing Chain-of-Thought Monitoring."...

Jun 2, 2025•78