x
This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
AF
Login
Charlie Griffin
Posts
Sorted by New
Wikitag Contributions
Comments
Sorted by
Newest
6
Practical challenges of control monitoring in frontier AI deployments
20d
0
30
Misalignment classifiers: Why they’re hard to evaluate adversarially, and why we're studying them anyway
6mo
0
24
Subversion Strategy Eval: Can language models statelessly strategize to subvert control protocols?
10mo
0
21
Games for AI Control
2y
0
23
Scenario Forecasting Workshop: Materials and Learnings
2y
1
9
Five projects from AI Safety Hub Labs 2023
2y
0
48
Goodhart's Law in Reinforcement Learning
2y
5
charlie_griffin — AI Alignment Forum
Comments