x
This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
AF
Login
lewis smith
Posts
Sorted by New
Wikitag Contributions
Comments
Sorted by
Newest
lewis smith — AI Alignment Forum
8
lewis smith's Shortform
1y
0
16
[Paper] Difficulties with Evaluating a Deception Detector for AIs
17d
1
31
How Can Interpretability Researchers Help AGI Go Well?
20d
1
60
A Pragmatic Vision for Interpretability
20d
10
17
Towards data-centric interpretability with sparse autoencoders
4mo
0
58
Negative Results for SAEs On Downstream Tasks and Deprioritising SAE Research (GDM Mech Interp Team Progress Update #2)
8mo
6
35
A Problem to Solve Before Building a Deception Detector
10mo
1
101
The ‘strong’ feature hypothesis could be wrong
1y
0
Review
39
Improving Dictionary Learning with Gated Sparse Autoencoders
2y
32
40
[Full Post] Progress Update #1 from the GDM Mech Interp Team
2y
3
36
[Summary] Progress Update #1 from the GDM Mech Interp Team
2y
0
Comments