AI ALIGNMENT FORUM
AF

2049
Charlie Griffin
Ω58300
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No Comments Found
No wikitag contributions to display.
28Misalignment classifiers: Why they’re hard to evaluate adversarially, and why we're studying them anyway
2mo
0
24Subversion Strategy Eval: Can language models statelessly strategize to subvert control protocols?
7mo
0
21Games for AI Control
1y
0
23Scenario Forecasting Workshop: Materials and Learnings
2y
1
9Five projects from AI Safety Hub Labs 2023
2y
0
48Goodhart's Law in Reinforcement Learning
2y
5