x
This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
AF
Login
GDM Mech Interp Progress Updates — AI Alignment Forum
GDM Mech Interp Progress Updates
36
[Summary] Progress Update #1 from the GDM Mech Interp Team
Neel Nanda
,
Arthur Conmy
,
lewis smith
,
Senthooran Rajamanoharan
,
Tom Lieberum
,
János Kramár
,
Vikrant Varma
2y
0
40
[Full Post] Progress Update #1 from the GDM Mech Interp Team
Neel Nanda
,
Arthur Conmy
,
lewis smith
,
Senthooran Rajamanoharan
,
Tom Lieberum
,
János Kramár
,
Vikrant Varma
2y
3
26
The GDM AGI Safety+Alignment Team is Hiring for Applied Interpretability Research
Arthur Conmy
,
Neel Nanda
10mo
0
58
Negative Results for SAEs On Downstream Tasks and Deprioritising SAE Research (GDM Mech Interp Team Progress Update #2)
lewis smith
,
Senthooran Rajamanoharan
,
Arthur Conmy
,
CallumMcDougall
,
Tom Lieberum
,
János Kramár
,
Rohin Shah
,
Neel Nanda
8mo
6
60
A Pragmatic Vision for Interpretability
Neel Nanda
,
Josh Engels
,
Arthur Conmy
,
Senthooran Rajamanoharan
,
bilalchughtai
,
CallumMcDougall
,
János Kramár
,
lewis smith
17d
10
31
How Can Interpretability Researchers Help AGI Go Well?
Neel Nanda
,
Josh Engels
,
Senthooran Rajamanoharan
,
Arthur Conmy
,
bilalchughtai
,
CallumMcDougall
,
János Kramár
,
lewis smith
17d
1