Paul Bogdan

Message

I am a MATS 8.0 scholar with Neel Nanda working on mechanistic interpretability and currently (Summer 2025) focusing on interpreting chain-of-thought.

216

Paul Bogdan

I am a MATS 8.0 scholar with Neel Nanda working on mechanistic interpretability and currently (Summer 2025) focusing on interpreting chain-of-thought.

Paul Bogdan — AI Alignment Forum

Paul Bogdan

Message

I am a MATS 8.0 scholar with Neel Nanda working on mechanistic interpretability and currently (Summer 2025) focusing on interpreting chain-of-thought.

216

Paul Bogdan

I am a MATS 8.0 scholar with Neel Nanda working on mechanistic interpretability and currently (Summer 2025) focusing on interpreting chain-of-thought.

Statistical suggestions for mech interp research and beyond

I am currently a MATS 8.0 scholar studying mechanistic interpretability with Neel Nanda. I’m also a postdoc in psychology/neuroscience. My perhaps most notable paper analyzed the last 20 years of psychology research, searching for trends in what papers do and do not replicate. I have some takes on statistics. tl;dr...

Aug 6, 2025•65

Unfaithful chain-of-thought as nudged reasoning

This piece is based on work conducted during MATS 8.0 and is part of a broader aim of interpreting chain-of-thought in reasoning models. tl;dr * Research on chain-of-thought (CoT) unfaithfulness shows how models’ CoTs may omit information that is relevant to their final decision. * Here, we sketch hypotheses for...

Jul 22, 2025•54

Thought Anchors: Which LLM Reasoning Steps Matter?

This post is adapted from our recent arXiv paper. Paul Bogdan and Uzay Macar are co-first authors on this work. TL;DR * Interpretability of chains-of-thought (CoTs) produced by LLMs is challenging: * Standard mechanistic interpretability studies a single token's generation but CoTs are sequences of reasoning steps that use thousands...

Jul 2, 2025•35