AI ALIGNMENT FORUM
AF

lewis smith
Ω150200
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No Comments Found
8lewis smith's Shortform
1y
0
No wikitag contributions to display.
17Towards data-centric interpretability with sparse autoencoders
1mo
0
58Negative Results for SAEs On Downstream Tasks and Deprioritising SAE Research (GDM Mech Interp Team Progress Update #2)
5mo
6
35A Problem to Solve Before Building a Deception Detector
7mo
1
101The ‘strong’ feature hypothesis could be wrong
1y
0
39Improving Dictionary Learning with Gated Sparse Autoencoders
1y
32
40[Full Post] Progress Update #1 from the GDM Mech Interp Team
1y
3
36[Summary] Progress Update #1 from the GDM Mech Interp Team
1y
0