AI ALIGNMENT FORUM
AF

Bart Bussmann
Ω108400
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
0Bart Bussmann's Shortform
1y
0
Matryoshka Sparse Autoencoders
Bart Bussmann9mo202

Great work! I have been working on something very similar and will publish my results here some time next week, but can already give a sneak-peak: 

The SAEs here were only trained for 100M tokens (1/3 the TinyStories[11:1] dataset). The language model was trained for 3 epochs on the 300M token TinyStories dataset. It would be good to validate these results with more 'real' language models and train SAEs with much more data.

I can confirm that on Gemma-2-2B Matryoshka SAEs dramatically improve the absorption score on the first-letter task from Chanin et al. as implemented in SAEBench!

Is there a nice way to extend the Matryoshka method to top-k SAEs?

Yes! My experiments with Matryoshka SAEs are using BatchTopK.

Are you planning to continue this line of research? If so, I would be interested to collaborate (or otherwise at least coordinate on not doing duplicate work).
 

Reply
No wikitag contributions to display.
20Learning Multi-Level Features with Matryoshka SAEs
8mo
1
35Showing SAE Latents Are Not Atomic Using Meta-SAEs
1y
6
30Calendar feature geometry in GPT-2 layer 8 residual stream SAEs
1y
0
32BatchTopK: A Simple Improvement for TopK-SAEs
1y
0
18Stitching SAEs of different sizes
1y
2