This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
AF
Login
1792
Connor Kissane — AI Alignment Forum
Connor Kissane
Posts
Sorted by New
Wikitag Contributions
Comments
Sorted by
Newest
41
White Box Control at UK AISI - Update on Sandbagging Investigations
4mo
5
34
SAEs are highly dataset dependent: a case study on the refusal direction
1y
0
23
Open Source Replication of Anthropic’s Crosscoder paper for model-diffing
1y
3
36
Base LLMs refuse too
1y
10
28
SAEs (usually) Transfer Between Base and Chat Models
1y
0
18
Attention Output SAEs Improve Circuit Analysis
1y
0
34
We Inspected Every Head In GPT-2 Small using SAEs So You Don’t Have To
2y
0
28
Attention SAEs Scale to GPT-2 Small
2y
0
35
Sparse Autoencoders Work on Attention Layer Outputs
2y
3
Comments