This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
AF
Login
Connor Kissane
Posts
Sorted by New
Wikitag Contributions
Comments
Sorted by
Newest
39
White Box Control at UK AISI - Update on Sandbagging Investigations
10d
5
34
SAEs are highly dataset dependent: a case study on the refusal direction
8mo
0
23
Open Source Replication of Anthropic’s Crosscoder paper for model-diffing
9mo
3
35
Base LLMs refuse too
10mo
10
28
SAEs (usually) Transfer Between Base and Chat Models
1y
0
18
Attention Output SAEs Improve Circuit Analysis
1y
0
34
We Inspected Every Head In GPT-2 Small using SAEs So You Don’t Have To
1y
0
28
Attention SAEs Scale to GPT-2 Small
1y
0
35
Sparse Autoencoders Work on Attention Layer Outputs
2y
3
Comments