Paper Walkthrough: Automated Circuit Discovery with Arthur Conmy

Neel Nanda

Paper Walkthrough: Automated Circuit Discovery with Arthur Conmy

by Neel Nanda

1 min read29th Aug 20231 comment

16

EducationInterpretability (ML & AI)Transformer CircuitsAI

Frontpage

This is a linkpost for https://www.youtube.com/watch?v=dn4GqR0DCx8&list=PL7m7hLIqA0hogxAaYtzlNolYAMr65NY45&index=1

Arthur Conmy's Automated Circuit Discovery is a great paper that makes initial forays into automating parts of mechanistic interpretability (specifically, automatically finding a sparse subgraph for a circuit). In this three part series of Youtube videos, I interview him about the paper, and we walk through it and discuss the key results and takeaways. We discuss the high-level point of the paper and what researchers should takeaway from it, the ACDC algorithm and its key nuances, existing baselines and how they adapted them to be relevant to circuit discovery, how well the algorithm works, and how you can even evaluate how well an interpretability method works.