Decomposing the QK circuit with Bilinear Sparse Dictionary Learning
This work was produced as part of Lee Sharkey's stream in the ML Alignment & Theory Scholars Program - Winter 2023-24 Cohort Intro and Motivation Sparse dictionary learning (SDL) has attracted a lot of attention recently as a method for interpreting transformer activations. They demonstrate that model activations can often...
Jul 2, 202486