LESSWRONG
LW

[ Question ]

Can you MRI a deep learning model?

by Yair Halberstadt

1 min read13th Jun 20222 answers 1 comment

3

Interpretability (ML & AI)AI

In an MRI scan you see which parts of a brain light up in response to a stimulus. This has proven invaluable in understanding brains.

Is there an equivalent thing you can do with deep learning models, where you can see which parts light up in response to stimuli? And does there exist good UIs to explore this? It seems like such a technique would be invaluable for understanding deep learning models, and possibly for alignment.

Can you MRI a deep learning model?

New Answer

New Comment

2 Answers sorted by
top scoring

Jun 13, 2022

20

Most neural networks don’t have anything comparable to specialised brain areas, at least structurally, so you can’t see which areas light up given some stimulus to determine what that part does. You can do it with individual neurons or channels, though. The best UI I know of to explore this is the “Dataset Samples” option in the OpenAI Microscope, that shows which inputs activate each unit.

Jun 13, 2022

20

The most similar analysis tool I'm aware of is called an activation atlas (https://distill.pub/2019/activation-atlas/), though I've only seen it applied to visual networks. Would love to see it used on language models!

1 comment, sorted by

Click to highlight new comments since: Today at 3:53 AM

This has proven invaluable in understanding brains.

It has? It's proven quite useful in understanding some types of injury and malfunction. And it may have given hints to developmental and very general structures. But I don't think it's helped very much in understanding cognitive effects or ideas.