This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
AF
Login
2049
Charlie Griffin
Posts
Sorted by New
Wikitag Contributions
Comments
Sorted by
Newest
28
Misalignment classifiers: Why they’re hard to evaluate adversarially, and why we're studying them anyway
2mo
0
24
Subversion Strategy Eval: Can language models statelessly strategize to subvert control protocols?
7mo
0
21
Games for AI Control
1y
0
23
Scenario Forecasting Workshop: Materials and Learnings
2y
1
9
Five projects from AI Safety Hub Labs 2023
2y
0
48
Goodhart's Law in Reinforcement Learning
2y
5
Comments