This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
AF
Login
Sid Black
Posts
Sorted by New
64
The Singular Value Decompositions of Transformer Weight Matrices are Highly Interpretable
4mo
10
33
Conjecture Second Hiring Round
4mo
0
63
Conjecture: a retrospective after 8 months of work
4mo
5
38
Current themes in mechanistic interpretability research
4mo
2
41
Interpreting Neural Networks through the Polytope Lens
6mo
10
50
Conjecture: Internal Infohazard Policy
8mo
2
Wiki Contributions
Comments