AI ALIGNMENT FORUM
AF

Can
Ω50100
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No Comments Found
No wikitag contributions to display.
42SAEBench: A Comprehensive Benchmark for Sparse Autoencoders
9mo
1
43OthelloGPT learned a bag of heuristics
1y
1
8An adversarial example for Direct Logit Attribution: memory management in gelu-4l
2y
0
21Understanding mesa-optimization using toy models
2y
0