AI ALIGNMENT FORUMTags
AF

Transformer Circuits

•

Applied to Can quantised autoencoders find and interpret circuits in language models? by Charlie O'Neill 1mo ago

•

Applied to Sparse Autoencoders Work on Attention Layer Outputs by robertzk 3mo ago

•

Applied to Finding Sparse Linear Connections between Features in LLMs by Logan Riggs Smith 4mo ago

•

Applied to AISC project: TinyEvals by Jett Janiak 5mo ago

•

Applied to Polysemantic Attention Head in a 4-Layer Transformer by Jett Janiak 5mo ago

•

Applied to Graphical tensor notation for interpretability by Jordan Taylor 7mo ago

•

Applied to Interpreting OpenAI's Whisper by Neel Nanda 7mo ago

•

Applied to Automatically finding feature vectors in the OV circuits of Transformers without using probing by Jacob Dunefsky 7mo ago

•

Applied to An adversarial example for Direct Logit Attribution: memory management in gelu-4l by Can 8mo ago

•

Applied to Paper Walkthrough: Automated Circuit Discovery with Arthur Conmy by Neel Nanda 8mo ago

•

Applied to An Interpretability Illusion for Activation Patching of Arbitrary Subspaces by Alex Makelov 8mo ago

•

Applied to Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla by Neel Nanda 9mo ago

•

Applied to How to Think About Activation Patching by Neel Nanda 11mo ago

•

Applied to An Analogy for Understanding Transformers by CallumMcDougall 1y ago

•

Applied to Finding Neurons in a Haystack: Case Studies with Sparse Probing by Wes Gurnee 1y ago

•

Applied to Explaining the Transformer Circuits Framework by Example by RobertM 1y ago

•

Applied to Addendum: More Efficient FFNs via Attention by Robert_AIZI 1y ago

•

Applied to No Really, Attention is ALL You Need - Attention can do feedforward networks by Robert_AIZI 1y ago