AI ALIGNMENT FORUM
AF

Interpretability (ML & AI)AI
Personal Blog

28

Mechanistic Interpretability Workshop Happening at ICML 2024!

by Neel Nanda, Lawrence Chan, fbarez
3rd May 2024
1 min read
6

28

Interpretability (ML & AI)AI
Personal Blog
New Comment
Moderation Log
Curated and popular this week
0Comments

Announcing the first academic Mechanistic Interpretability workshop, held at ICML 2024! I think this is an exciting development that's a lagging indicator of mech interp gaining legitimacy as an academic field, and a good chance for field building and sharing recent progress! 

We'd love to get papers submitted if any of you have relevant projects! Deadline May 29, max 4 or max 8 pages. We welcome anything that brings us closer to a principled understanding of model internals, even if it's not "traditional” mech interp. Check out our website for example topics! There's $1750 in best paper prizes. We also welcome less standard submissions, like open source software, models or datasets, negative results, distillations, or position pieces. 

And if anyone is attending ICML, you'd be very welcome at the workshop! We have a great speaker line-up: Chris Olah, Jacob Steinhardt, David Bau and Asma Ghandeharioun. And a panel discussion, hands-on tutorial, and social. I’m excited to meet more people into mech interp! And if you know anyone who might be interested in attending/submitting, please pass this on.

Twitter thread, Website

Thanks to my great co-organisers: Fazl Barez, Lawrence Chan, Kayo Yin, Mor Geva, Atticus Geiger and Max Tegmark