AI ALIGNMENT FORUM
AF

lewis smith
Ω147200
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No Comments Found
8lewis smith's Shortform
11mo
0
No wikitag contributions to display.
58Negative Results for SAEs On Downstream Tasks and Deprioritising SAE Research (GDM Mech Interp Team Progress Update #2)
3mo
6
35A Problem to Solve Before Building a Deception Detector
5mo
1
98The ‘strong’ feature hypothesis could be wrong
1y
0
39Improving Dictionary Learning with Gated Sparse Autoencoders
1y
32
40[Full Post] Progress Update #1 from the GDM Mech Interp Team
1y
3
36[Summary] Progress Update #1 from the GDM Mech Interp Team
1y
0