Top postsTop post
jacek
207
Ω
66
4
4
Produced As Part Of The OxAI Safety Labs program, mentored by Joar Skalse. TL;DR This is a blog post introducing our new paper, "Goodhart's Law in Reinforcement Learning" (to appear at ICLR 2024). We study Goodhart's law in RL empirically, provide a geometric explanation for why it occurs, and use...
The paper Optimal Policies Tend to Seek Power tries to justify the claim in the title with a mathematical argument, using a clever formal definition of "power". I like the general approach, but I think that parts of the formalisation do not correspond well to the intuitive understanding of the...