This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
Tags
AF
Login
Superposition
•
Applied to
Crafting Polysemantic Transformer Benchmarks with Known Circuits
by
Evan Anders
19d
ago
•
Applied to
Limitations on the Interpretability of Learned Features from Sparse Dictionary Learning
by
Tom Angsten
1mo
ago
•
Applied to
Superposition is not "just" neuron polysemanticity
by
Lawrence Chan
5mo
ago
•
Applied to
Scaling Laws and Superposition
by
Pavan Katta
5mo
ago
•
Applied to
Sparse autoencoders find composed features in small toy models
by
Evan Anders
6mo
ago
•
Applied to
Some costs of superposition
by
Linda Linsefors
6mo
ago
•
Applied to
From Conceptual Spaces to Quantum Concepts: Formalising and Learning Structured Conceptual Models
by
Roman Leventov
7mo
ago
•
Applied to
AI alignment as a translation problem
by
Roman Leventov
7mo
ago
•
Applied to
Open Source Sparse Autoencoders for all Residual Stream Layers of GPT2-Small
by
Joseph Isaac Bloom
7mo
ago
•
Applied to
Toward A Mathematical Framework for Computation in Superposition
by
Nina Panickssery
8mo
ago
•
Applied to
Sparse MLP Distillation
by
slavachalnev
9mo
ago
•
Applied to
Towards Monosemanticity: Decomposing Language Models With Dictionary Learning
by
duck_master
9mo
ago
•
Applied to
Some open-source dictionaries and dictionary learning infrastructure
by
duck_master
9mo
ago
•
Applied to
Comparing Anthropic's Dictionary Learning to Ours
by
duck_master
9mo
ago
•
Applied to
Intro to Superposition & Sparse Autoencoders (Colab exercises)
by
duck_master
9mo
ago
•
Applied to
Superposition and Dropout
by
duck_master
9mo
ago