This is the latest work in our Parameter Decomposition agenda. We introduce a new parameter decomposition method, adVersarial Parameter Decomposition (VPD)[1] and decompose the parameters of a small[2] language model with it. VPD greatly improves on our previous techniques, Stochastic Parameter Decomposition (SPD) and Attribution-based Parameter Decomposition (APD). We think...
TL;DR We experimentally test the mathematical framework for circuits in superposition by hand-coding the weights of an MLP to implement many conditional[1] rotations in superposition on two-dimensional input features. The code can be found here. This work was supported by Coefficient Giving and Goodfire AI 1 Introduction Figure 1: The...
Summary & Motivation This post is a continuation and clarification of Circuits in Superposition: Compressing many small neural networks into one. That post presented a sketch of a general mathematical framework for compressing different circuits into a network in superposition. On closer inspection, some of it turned out to be...
We are pleased to announce that the 10th version of the AI Safety Camp is now entering the team member application phase! AI Safety Camp is a 3-month long online research program from January to April 2025, where participants form teams to work on pre-selected projects. We have a wide...
Do you have AI Safety research ideas that you would like to work on with others? Is there a project you want to do and you want help finding a team? AI Safety Camp could be the solution for you! Summary AI Safety Camp Virtual is a 3-month long online...
The 9th AI Safety Camp (AISC9) just ended, and as usual, it was a success! Follow this link to find project summaries, links to their outputs, recordings to the end of camp presentations and contact info to all our teams in case you want to engage more. AISC9 both had...
I don't expect this post to contain anything novel. But from talking to others it seems like some of what I have to say in this post is not widely known, so it seemed worth writing. In this post I'm defining superposition as: A representation with more features than neurons,...