AI ALIGNMENT FORUM
AF

GDM Mech Interp Progress Updates

Apr 19, 2024 by Neel Nanda
36[Summary] Progress Update #1 from the GDM Mech Interp Team
Neel Nanda, Arthur Conmy, lewis smith, Senthooran Rajamanoharan, Tom Lieberum, János Kramár, Vikrant Varma
1y
0
40[Full Post] Progress Update #1 from the GDM Mech Interp Team
Neel Nanda, Arthur Conmy, lewis smith, Senthooran Rajamanoharan, Tom Lieberum, János Kramár, Vikrant Varma
1y
3
26The GDM AGI Safety+Alignment Team is Hiring for Applied Interpretability Research
Arthur Conmy, Neel Nanda
5mo
0
58Negative Results for SAEs On Downstream Tasks and Deprioritising SAE Research (GDM Mech Interp Team Progress Update #2)
lewis smith, Senthooran Rajamanoharan, Arthur Conmy, CallumMcDougall, Tom Lieberum, János Kramár, Rohin Shah, Neel Nanda
3mo
6