This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
Tags
AF
Login
Transformer Circuits
•
Applied to
Can quantised autoencoders find and interpret circuits in language models?
by
Charlie O'Neill
1mo
ago
•
Applied to
Sparse Autoencoders Work on Attention Layer Outputs
by
robertzk
3mo
ago
•
Applied to
Finding Sparse Linear Connections between Features in LLMs
by
Logan Riggs Smith
4mo
ago
•
Applied to
AISC project: TinyEvals
by
Jett Janiak
5mo
ago
•
Applied to
Polysemantic Attention Head in a 4-Layer Transformer
by
Jett Janiak
5mo
ago
•
Applied to
Graphical tensor notation for interpretability
by
Jordan Taylor
7mo
ago
•
Applied to
Interpreting OpenAI's Whisper
by
Neel Nanda
7mo
ago
•
Applied to
Automatically finding feature vectors in the OV circuits of Transformers without using probing
by
Jacob Dunefsky
7mo
ago
•
Applied to
An adversarial example for Direct Logit Attribution: memory management in gelu-4l
by
Can
8mo
ago
•
Applied to
Paper Walkthrough: Automated Circuit Discovery with Arthur Conmy
by
Neel Nanda
8mo
ago
•
Applied to
An Interpretability Illusion for Activation Patching of Arbitrary Subspaces
by
Alex Makelov
8mo
ago
•
Applied to
Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla
by
Neel Nanda
9mo
ago
•
Applied to
How to Think About Activation Patching
by
Neel Nanda
11mo
ago
•
Applied to
An Analogy for Understanding Transformers
by
CallumMcDougall
1y
ago
•
Applied to
Finding Neurons in a Haystack: Case Studies with Sparse Probing
by
Wes Gurnee
1y
ago
•
Applied to
Explaining the Transformer Circuits Framework by Example
by
RobertM
1y
ago
•
Applied to
Addendum: More Efficient FFNs via Attention
by
Robert_AIZI
1y
ago
•
Applied to
No Really, Attention is ALL You Need - Attention can do feedforward networks
by
Robert_AIZI
1y
ago