AI ALIGNMENT FORUM
AF

Georg Lange
Ω34100
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
Some costs of superposition
Georg Lange2y20

Calculating l, the maximal number of simultaneously active features, yields strange results. For example, if we have 100 features and 100 neurons, l has to be < 100/(8 * ln(100)) = 2.7. But I would expect that 100 features can be simultaneously active because we have 100 dimensions, so the features can be orthogonal and independent. Am I understanding something wrong?

Reply
10SAEs Discover Meaningful Features in the IOI Task
1y
1
35An Interpretability Illusion for Activation Patching of Arbitrary Subspaces
2y
3