TL;DR We experimentally test the mathematical framework for circuits in superposition by hand-coding the weights of an MLP to implement many conditional[1] rotations in superposition on two-dimensional input features. The code can be found here. This work was supported by Coefficient Giving and Goodfire AI 1 Introduction Figure 1: The...
Summary & Motivation This post is a continuation and clarification of Circuits in Superposition: Compressing many small neural networks into one. That post presented a sketch of a general mathematical framework for compressing different circuits into a network in superposition. On closer inspection, some of it turned out to be...
We are pleased to announce that the 10th version of the AI Safety Camp is now entering the team member application phase! AI Safety Camp is a 3-month long online research program from January to April 2025, where participants form teams to work on pre-selected projects. We have a wide...
Do you have AI Safety research ideas that you would like to work on with others? Is there a project you want to do and you want help finding a team? AI Safety Camp could be the solution for you! Summary AI Safety Camp Virtual is a 3-month long online...
The 9th AI Safety Camp (AISC9) just ended, and as usual, it was a success! Follow this link to find project summaries, links to their outputs, recordings to the end of camp presentations and contact info to all our teams in case you want to engage more. AISC9 both had...
I don't expect this post to contain anything novel. But from talking to others it seems like some of what I have to say in this post is not widely known, so it seemed worth writing. In this post I'm defining superposition as: A representation with more features than neurons,...
AI Safety Camp connects you with a research lead to collaborate on a project – to see where your work could help ensure future AI is safe. Apply before December 1, to collaborate online from January to April 2024. We value diverse backgrounds. Many roles but definitely not all require...