This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
AF
Login
178
Sid Black
Posts
Sorted by New
Wikitag Contributions
Comments
Sorted by
Newest
19
Do LLMs know what they're capable of? Why this matters for AI safety, and initial findings
3mo
0
41
White Box Control at UK AISI - Update on Sandbagging Investigations
3mo
5
69
The Singular Value Decompositions of Transformer Weight Matrices are Highly Interpretable
3y
11
33
Conjecture Second Hiring Round
3y
0
61
Conjecture: a retrospective after 8 months of work
3y
5
38
Current themes in mechanistic interpretability research
3y
2
47
Interpreting Neural Networks through the Polytope Lens
3y
11
53
Conjecture: Internal Infohazard Policy
3y
2
Comments