Transformer Dynamics: a neuro-inspired approach to MechInterp
How do AI models work? In many ways, we know the answer to this question, because we engineered those models in the first place. But in other, fundamental, ways, we have no idea. Systems with many parts that interact with each other nonlinearly are hard to understand. By “understand” we...
Feb 22, 202511