AI ALIGNMENT FORUM
AF

jacob_drori
Ω-1000
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
Improving Dictionary Learning with Gated Sparse Autoencoders
jacob_drori1y-10

Nice work! I'm not sure I fully understand what the "gated-ness" is adding, i.e. what the role the Heaviside step function is playing. What would happen if we did away with it? Namely, consider this setup:

Let f and ^x  be the encoder and decoder functions, as in your paper, and let x be the model activation that is fed into the SAE.

The usual SAE reconstruction is ^x(f(x)), which suffers from the shrinkage problem.

Now, introduce a new learned parameter t∈Rnfeatures, and define an "expanded" reconstruction yexpanded=^x(t⊙f(x)), where ⊙ denotes elementwise multiplication.

Finally, take the loss to be:

L=||^xcopy(f(x))−x||22+||yexpanded−x||22+λ||f(x)||1.

where ^xcopy ensures the decoder gets no gradients from the first term. As I understand it, this is exactly the loss appearing in your paper. The only difference in the setup is the lack of the Heaviside step function.

Did you try this setup? Or does it fail for an obvious reason I missed?

Reply
Some costs of superposition
jacob_drori1y30

The typical noise on feature f1 caused by 1 unit of activation from feature f2, for any pair of features f1, f2, is (derived from Johnson–Lindenstrauss lemma)

 ϵ=√8ln(m)n  [1]

  

1. ... This is a worst case scenario. I have not calculated the typical case, but I expect it to be somewhat less, but still same order of magnitude

Perhaps I'm misunderstanding your claim here, but the "typical" (i.e. RMS) inner product between two independently random unit vectors in Rn is n−1/2. So I think the √8lnm shouldn't be there, and the rest of your estimates are incorrect.

This means that we can have at most  l<n/(8ln(m)) simultaneously active features

This conclusion gets changed to l<n.

Reply
18SAE on activation differences
13d
0