x

AI ALIGNMENT FORUM
AF

Sid Black — AI Alignment Forum

Sid Black

Ω47000

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by

No Comments Found

No wikitag contributions to display.

19Do LLMs know what they're capable of? Why this matters for AI safety, and initial findings

5mo

0

42White Box Control at UK AISI - Update on Sandbagging Investigations

5mo

5

69The Singular Value Decompositions of Transformer Weight Matrices are Highly Interpretable

3y

11

33Conjecture Second Hiring Round

3y

0

61Conjecture: a retrospective after 8 months of work

3y

5

38Current themes in mechanistic interpretability research

3y

2

47Interpreting Neural Networks through the Polytope Lens

3y

11

53Conjecture: Internal Infohazard Policy

3y

2