Top postsTop post
Dan Valentine
163
Ω
1
8
11
We've just completed a bunch of empirical work on LLM debate, and we're excited to share the results. If the title of this post is at all interesting to you, we recommend heading straight to the paper. There are a lot of interesting results that are hard to summarize, and...
Overview * Solving the problem of mesa-optimization would probably be easier if we understood how models do search internally * We are training GPT-type models on the toy task of solving mazes and studying them in both a mechanistic interpretability and behavioral context. * This post lays out our model...