AI ALIGNMENT FORUM
AF

260
Wikitags

Verification

This page is a stub.
Subscribe
Discussion
Subscribe
Discussion
Posts tagged Verification
23Validating against a misalignment detector is very different to training against one
mattmacdermott
7mo
4
81Formal verification, heuristic explanations and surprise accounting
Jacob_Hilton
1y
3
46Compact Proofs of Model Performance via Mechanistic Interpretability
LawrenceC, rajashree, Adrià Garriga-alonso, Jason Gross
1y
2
7Making it harder for an AGI to "trick" us, with STVs
Tor Økland Barstad
3y
0
7Alignment with argument-networks and assessment-predictions
Tor Økland Barstad
3y
0
Add Posts