x
This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
AF
Login
charlie_griffin — AI Alignment Forum
Charlie Griffin
Posts
Sorted by New
Wikitag Contributions
Comments
Sorted by
Newest
6
Practical challenges of control monitoring in frontier AI deployments
19d
0
30
Misalignment classifiers: Why they’re hard to evaluate adversarially, and why we're studying them anyway
6mo
0
24
Subversion Strategy Eval: Can language models statelessly strategize to subvert control protocols?
10mo
0
21
Games for AI Control
2y
0
23
Scenario Forecasting Workshop: Materials and Learnings
2y
1
9
Five projects from AI Safety Hub Labs 2023
2y
0
48
Goodhart's Law in Reinforcement Learning
2y
5
Comments