This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
AF
Login
Wikitags
Verification
This page is a stub.
Subscribe
Subscribe
Discussion
0
Discussion
0
Posts tagged
Verification
Most Relevant
23
Validating against a misalignment detector is very different to training against one
mattmacdermott
6mo
4
79
Formal verification, heuristic explanations and surprise accounting
Jacob_Hilton
1y
3
46
Compact Proofs of Model Performance via Mechanistic Interpretability
LawrenceC
,
rajashree
,
Adrià Garriga-alonso
,
Jason Gross
1y
2
7
Making it harder for an AGI to "trick" us, with STVs
Tor Økland Barstad
3y
0
7
Alignment with argument-networks and assessment-predictions
Tor Økland Barstad
3y
0