AI ALIGNMENT FORUM
AF

Sid Black
Ω47000
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No Comments Found
No wikitag contributions to display.
13Do LLMs know what they're capable of? Why this matters for AI safety, and initial findings
2d
0
39White Box Control at UK AISI - Update on Sandbagging Investigations
5d
5
69The Singular Value Decompositions of Transformer Weight Matrices are Highly Interpretable
3y
11
33Conjecture Second Hiring Round
3y
0
61Conjecture: a retrospective after 8 months of work
3y
5
38Current themes in mechanistic interpretability research
3y
2
47Interpreting Neural Networks through the Polytope Lens
3y
11
53Conjecture: Internal Infohazard Policy
3y
2