This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
AF
Login
lewis smith
Posts
Sorted by New
Wikitag Contributions
Comments
Sorted by
Newest
8
lewis smith's Shortform
11mo
0
58
Negative Results for SAEs On Downstream Tasks and Deprioritising SAE Research (GDM Mech Interp Team Progress Update #2)
3mo
6
35
A Problem to Solve Before Building a Deception Detector
5mo
1
98
The ‘strong’ feature hypothesis could be wrong
1y
0
39
Improving Dictionary Learning with Gated Sparse Autoencoders
1y
32
40
[Full Post] Progress Update #1 from the GDM Mech Interp Team
1y
3
36
[Summary] Progress Update #1 from the GDM Mech Interp Team
1y
0
Comments