AI ALIGNMENT FORUM
AF

200 Concrete Open Problems in Mechanistic Interpretability

Dec 28, 2022 by Neel Nanda
17Concrete Steps to Get Started in Transformer Mechanistic Interpretability
Neel Nanda
3y
5
39200 Concrete Open Problems in Mechanistic Interpretability: Introduction
Neel Nanda
3y
0
18200 COP in MI: The Case for Analysing Toy Language Models
Neel Nanda
3y
2
8200 COP in MI: Looking for Circuits in the Wild
Neel Nanda
3y
3
17200 COP in MI: Interpreting Algorithmic Problems
Neel Nanda
3y
0
18200 COP in MI: Exploring Polysemanticity and Superposition
Neel Nanda
3y
1
11200 COP in MI: Analysing Training Dynamics
Neel Nanda
3y
0
7200 COP in MI: Techniques, Tooling and Automation
Neel Nanda
3y
0
10200 COP in MI: Image Model Interpretability
Neel Nanda
3y
1
10200 COP in MI: Interpreting Reinforcement Learning
Neel Nanda
2y
0
11200 COP in MI: Studying Learned Features in Language Models
Neel Nanda
2y
2