x

AI ALIGNMENT FORUM

AF

aryopg — AI Alignment Forum

Aryo Pradipta Gema

Aryo Pradipta Gema

Message

10

1y

Aryo Pradipta Gema

10

1y

Inverse Scaling in Test-Time Compute

by Joe Benton, Ethan Perez, and aryopg

We construct evaluation tasks where extending the reasoning length of Large Reasoning Models (LRMs) deteriorates performance, exhibiting an inverse scaling relationship between test-time compute and accuracy. We identify five distinct failure modes when models reason for longer: * Claude models become increasingly distracted by irrelevant information * OpenAI o-series models...

Jul 22, 2025•20