AdamYedidia — AI Alignment Forum

New Tool: the Residual Stream Viewer

This is a link-post for the residual stream viewer, which can be found here. It's an online tool whose goal is to make it easier to do interpretability research by letting you easily look at directions within the residual stream. It's still in a quite early/unpolished state, so there may...

Oct 1, 202332

The positional embedding matrix and previous-token heads: how do they actually work?

tl;dr: This post starts with a mystery about positional embeddings in GPT2-small, and from there explains how they relate to previous-token heads, i.e. attention heads whose role is to attend to the previous token. I tried to make the post relatively accessible even if you're not already very familiar with...

Aug 10, 202328

SmartyHeaderCode: anomalous tokens for GPT3.5 and GPT-4

TL;DR: There are anomalous tokens for GPT3.5 and GPT4 which are difficult or impossible for the model to repeat; try playing around with SmartyHeaderCode, APolynomial, or davidjl. There are also plenty which can be repeated but are difficult for the model to spell out, like edTextBox or legalArgumentException. A couple...

Apr 15, 202371

Adam Yedidia

Adam Yedidia

Adam Yedidia

Adam Yedidia

New Tool: the Residual Stream Viewer

The positional embedding matrix and previous-token heads: how do they actually work?

SmartyHeaderCode: anomalous tokens for GPT3.5 and GPT-4