AI ALIGNMENT FORUM
AF

276
lewis smith
Ω150200
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No Comments Found
No wikitag contributions to display.
8lewis smith's Shortform
1y
0
17Towards data-centric interpretability with sparse autoencoders
3mo
0
58Negative Results for SAEs On Downstream Tasks and Deprioritising SAE Research (GDM Mech Interp Team Progress Update #2)
7mo
6
35A Problem to Solve Before Building a Deception Detector
9mo
1
101The ‘strong’ feature hypothesis could be wrong
1y
0
39Improving Dictionary Learning with Gated Sparse Autoencoders
2y
32
40[Full Post] Progress Update #1 from the GDM Mech Interp Team
2y
3
36[Summary] Progress Update #1 from the GDM Mech Interp Team
2y
0