Noa Nabeshima
Matryoshka Sparse Autoencoders
View trees here Search through latents with a token-regex language View individual latents here See code here (github.com/noanabeshima/matryoshka-saes) Alternate version of this document with appropriate-height interactives. Abstract Sparse autoencoders (SAEs)[1][2] break down neural network internals into components called latents. Smaller SAE latents seem to correspond to more abstract concepts while...
Poll: Which variables are most strategically relevant?
Which variables are most important for predicting and influencing how AI goes? Here are some examples: * Timelines: “When will crazy AI stuff start to happen?” * Alignment tax: “How much more difficult will it be to create an aligned AI vs an unaligned AI when it becomes possible to...