This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
AF
Login
lewis smith
Posts
Sorted by New
Wikitag Contributions
Comments
Sorted by
Newest
8
lewis smith's Shortform
1y
0
17
Towards data-centric interpretability with sparse autoencoders
1mo
0
58
Negative Results for SAEs On Downstream Tasks and Deprioritising SAE Research (GDM Mech Interp Team Progress Update #2)
5mo
6
35
A Problem to Solve Before Building a Deception Detector
7mo
1
101
The ‘strong’ feature hypothesis could be wrong
1y
0
39
Improving Dictionary Learning with Gated Sparse Autoencoders
1y
32
40
[Full Post] Progress Update #1 from the GDM Mech Interp Team
1y
3
36
[Summary] Progress Update #1 from the GDM Mech Interp Team
1y
0
Comments