AI ALIGNMENT FORUM
AF

1521
lewis smith
Ω150200
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
No Comments Found
8lewis smith's Shortform
1y
0
17Towards data-centric interpretability with sparse autoencoders
2mo
0
58Negative Results for SAEs On Downstream Tasks and Deprioritising SAE Research (GDM Mech Interp Team Progress Update #2)
6mo
6
35A Problem to Solve Before Building a Deception Detector
8mo
1
101The ‘strong’ feature hypothesis could be wrong
1y
0
39Improving Dictionary Learning with Gated Sparse Autoencoders
1y
32
40[Full Post] Progress Update #1 from the GDM Mech Interp Team
1y
3
36[Summary] Progress Update #1 from the GDM Mech Interp Team
1y
0