Laying the Foundations for Vision and Multimodal Mechanistic Interpretability & Open Problems
Behold the dogit lens. Patch-level logit attribution is an emergent segmentation map. Join our Discord here. This article was written by Sonia Joseph, in collaboration with Neel Nanda, and incubated in Blake Richards’s lab at Mila and in the MATS community. Thank you to the Prisma core contributors, including Praneet...
Mar 13, 202444