x

AI ALIGNMENT FORUM

AF

Petr Kašpárek

Petr Kašpárek

Message

28

9

1y

Petr Kašpárek

28

1y

Petr Kašpárek — AI Alignment Forum

How LLM Beliefs Change During Chain-of-Thought Reasoning

11

Filip Sondej, Petr Kašpárek, alex-kazda, Tomáš Gavenčiak

10mo

Summary

We tried to figure out how a model's beliefs change during a chain-of-thought (CoT) when solving a logical problem. Measuring this could reveal which parts of the CoT actually causally influence the final answer and which are just fake reasoning manufactured to sound plausible. (Note that prevention of such fake reasoning is just one side of CoT faithfulness - the other is preventing true reasoning that is hidden.)

We estimate the beliefs by truncating the CoT early and asking the model for an answer. Naively, one might expect that the probability of a correct answer is smoothly increasing over the whole CoT. However, it turns out that even for a straightforward and short chain of thought the value of P[correct_answer] fluctuates a lot with the number of tokens of CoT...

(Continue Reading - 1324 more words)