x
This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
AF
Login
lewis smith — AI Alignment Forum
lewis smith
Posts
Sorted by New
Wikitag Contributions
Comments
Sorted by
Newest
8
lewis smith's Shortform
1y
0
17
Towards data-centric interpretability with sparse autoencoders
4mo
0
58
Negative Results for SAEs On Downstream Tasks and Deprioritising SAE Research (GDM Mech Interp Team Progress Update #2)
8mo
6
35
A Problem to Solve Before Building a Deception Detector
10mo
1
101
The ‘strong’ feature hypothesis could be wrong
1y
0
39
Improving Dictionary Learning with Gated Sparse Autoencoders
2y
32
40
[Full Post] Progress Update #1 from the GDM Mech Interp Team
2y
3
36
[Summary] Progress Update #1 from the GDM Mech Interp Team
2y
0
Comments