This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
AF
Login
Robert Krzyzanowski
Posts
Sorted by New
32
SAEs are highly dataset dependent: a case study on the refusal direction
2d
0
18
Open Source Replication of Anthropic’s Crosscoder paper for model-diffing
12d
0
35
Base LLMs refuse too
1mo
10
28
SAEs (usually) Transfer Between Base and Chat Models
4mo
0
18
Attention Output SAEs Improve Circuit Analysis
5mo
0
33
We Inspected Every Head In GPT-2 Small using SAEs So You Don’t Have To
8mo
0
28
Attention SAEs Scale to GPT-2 Small
9mo
0
35
Sparse Autoencoders Work on Attention Layer Outputs
10mo
3
29
Training Process Transparency through Gradient Interpretability: Early experiments on toy language models
1y
1
14
Getting up to Speed on the Speed Prior in 2022
2y
0
Wiki Contributions
Comments
Sorted by
Newest