This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
AF
Login
Can
Posts
Sorted by New
Wikitag Contributions
Comments
Sorted by
Newest
42
SAEBench: A Comprehensive Benchmark for Sparse Autoencoders
9mo
1
43
OthelloGPT learned a bag of heuristics
1y
1
8
An adversarial example for Direct Logit Attribution: memory management in gelu-4l
2y
0
21
Understanding mesa-optimization using toy models
2y
0
Comments