Stefan Heimersheim

Stefan Heimersheim. SERIMATS scholar researching Mechanistic Interpretability of Transformers. Trying to figure out AI Alignment. Final-year PhD student in Astronomy, University of Cambridge. Website: StefanHex.com

Wiki Contributions

Comments

Thank for for the extensive comment! Your summary is really helpful to see how this came across, here's my take on a couple of these points:

2.b: The network would be sneaking information about the size of the residual stream past LayerNorm. So the network wants to implement an sort of "grow by a factor X every layer" and wants to prevent LayerNorm from resetting its progress.

  1. There's the difference between (i) How does the model make the residual stream grow exponentially -- the answer is probably theory 1, that something in the weights grow exponentially. And there is (ii) our best guess on Why the model would ever want this, which is the information deletion thing.

How and why disconnected

Yep we give some evidence for How, but for Why we have only a guess.

still don't feel like I know why though

earn generic "amplification" functions

Yes, all we have is some intuition here. It seems plausible that the model needs to communicate stuff between some layers, but doesn't want this to take up space in the residual stream. So this exponential growth is a neat way to make old information decay away (relatively). And it seems plausible to implement a few amplification circuits for information that has to be preserved for much later in the network.

We would love to see more ideas & hypotheses on why the model might be doing this, as well as attempts to test this! We mainly wrote-up this post because both Alex and I independently noticed this and weren't aware of this previously, so we wanted to make a reference post.