x
This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
AF
Login
200 Concrete Open Problems in Mechanistic Interpretability — AI Alignment Forum
200 Concrete Open Problems in Mechanistic Interpretability
17
Concrete Steps to Get Started in Transformer Mechanistic Interpretability
Neel Nanda
3y
5
41
200 Concrete Open Problems in Mechanistic Interpretability: Introduction
Neel Nanda
3y
0
18
200 COP in MI: The Case for Analysing Toy Language Models
Neel Nanda
3y
2
8
200 COP in MI: Looking for Circuits in the Wild
Neel Nanda
3y
3
17
200 COP in MI: Interpreting Algorithmic Problems
Neel Nanda
3y
0
18
200 COP in MI: Exploring Polysemanticity and Superposition
Neel Nanda
3y
1
11
200 COP in MI: Analysing Training Dynamics
Neel Nanda
3y
0
7
200 COP in MI: Techniques, Tooling and Automation
Neel Nanda
3y
0
10
200 COP in MI: Image Model Interpretability
Neel Nanda
3y
1
10
200 COP in MI: Interpreting Reinforcement Learning
Neel Nanda
3y
0
11
200 COP in MI: Studying Learned Features in Language Models
Neel Nanda
3y
2