x
This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
AF
Login
Can
Posts
Sorted by New
Wikitag Contributions
Comments
Sorted by
Newest
Can — AI Alignment Forum
42
SAEBench: A Comprehensive Benchmark for Sparse Autoencoders
1y
1
43
OthelloGPT learned a bag of heuristics
1y
1
8
An adversarial example for Direct Logit Attribution: memory management in gelu-4l
2y
0
21
Understanding mesa-optimization using toy models
3y
0
Comments