AI ALIGNMENT FORUM
AF

Wikitags

Transformer Circuits

This page is a stub.
Subscribe
1
Subscribe
1
Discussion0
Discussion0
Posts tagged Transformer Circuits
19Finding Neurons in a Haystack: Case Studies with Sparse Probing
wesg, Neel Nanda
2y
1
53An Extremely Opinionated Annotated List of My Favourite Mechanistic Interpretability Papers v2
Neel Nanda
1y
10
39200 Concrete Open Problems in Mechanistic Interpretability: Introduction
Neel Nanda
3y
0
26Finding Sparse Linear Connections between Features in LLMs
Logan Riggs, Sam Mitchell, Adam Kaufman
2y
2
23How to Think About Activation Patching
Neel Nanda
2y
3
24Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla
Neel Nanda, Tom Lieberum, Matthew Rahtz, János Kramár, Geoffrey Irving, Rohin Shah, Vlad Mikulik
2y
0
16Paper Walkthrough: Automated Circuit Discovery with Arthur Conmy
Neel Nanda
2y
1
18200 COP in MI: Exploring Polysemanticity and Superposition
Neel Nanda
3y
1
17200 COP in MI: Interpreting Algorithmic Problems
Neel Nanda
3y
0
16A Walkthrough of Interpretability in the Wild (w/ authors Kevin Wang, Arthur Conmy & Alexandre Variengien)
Neel Nanda
3y
15
11A Walkthrough of In-Context Learning and Induction Heads (w/ Charles Frye) Part 1 of 2
Neel Nanda
3y
0
9Sleep peacefully: no hidden reasoning detected in LLMs. Well, at least in small ones.
Ilia Shirokov, Ilya Nachevsky
5mo
0
8200 COP in MI: Looking for Circuits in the Wild
Neel Nanda
3y
3
11200 COP in MI: Analysing Training Dynamics
Neel Nanda
3y
0
7200 COP in MI: Techniques, Tooling and Automation
Neel Nanda
3y
0
Load More (15/23)
Add Posts