This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
AF
Login
276
lewis smith
Posts
Sorted by New
Wikitag Contributions
Comments
Sorted by
Newest
8
lewis smith's Shortform
1y
0
17
Towards data-centric interpretability with sparse autoencoders
3mo
0
58
Negative Results for SAEs On Downstream Tasks and Deprioritising SAE Research (GDM Mech Interp Team Progress Update #2)
7mo
6
35
A Problem to Solve Before Building a Deception Detector
9mo
1
101
The ‘strong’ feature hypothesis could be wrong
1y
0
39
Improving Dictionary Learning with Gated Sparse Autoencoders
2y
32
40
[Full Post] Progress Update #1 from the GDM Mech Interp Team
2y
3
36
[Summary] Progress Update #1 from the GDM Mech Interp Team
2y
0
Comments