AI ALIGNMENT FORUM
AF

Wikitags

Verification

This page is a stub.
Subscribe
Subscribe
Discussion0
Discussion0
Posts tagged Verification
23Validating against a misalignment detector is very different to training against one
mattmacdermott
6mo
4
79Formal verification, heuristic explanations and surprise accounting
Jacob_Hilton
1y
3
46Compact Proofs of Model Performance via Mechanistic Interpretability
LawrenceC, rajashree, Adrià Garriga-alonso, Jason Gross
1y
2
7Making it harder for an AGI to "trick" us, with STVs
Tor Økland Barstad
3y
0
7Alignment with argument-networks and assessment-predictions
Tor Økland Barstad
3y
0
Add Posts