x

AI ALIGNMENT FORUM
AF

charlie_griffin — AI Alignment Forum

Charlie Griffin

Ω59300

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by

No wikitag contributions to display.

No Comments Found

29Misalignment classifiers: Why they’re hard to evaluate adversarially, and why we're studying them anyway

4mo

0

24Subversion Strategy Eval: Can language models statelessly strategize to subvert control protocols?

8mo

0

21Games for AI Control

1y

0

23Scenario Forecasting Workshop: Materials and Learnings

2y

1

9Five projects from AI Safety Hub Labs 2023

2y

0

48Goodhart's Law in Reinforcement Learning

2y

5