x
This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
AF
Login
charlie_griffin — AI Alignment Forum
Charlie Griffin
Posts
Sorted by New
Wikitag Contributions
Comments
Sorted by
Newest
29
Misalignment classifiers: Why they’re hard to evaluate adversarially, and why we're studying them anyway
4mo
0
24
Subversion Strategy Eval: Can language models statelessly strategize to subvert control protocols?
8mo
0
21
Games for AI Control
1y
0
23
Scenario Forecasting Workshop: Materials and Learnings
2y
1
9
Five projects from AI Safety Hub Labs 2023
2y
0
48
Goodhart's Law in Reinforcement Learning
2y
5
Comments