x

AI ALIGNMENT FORUM

AF

jacek — AI Alignment Forum

jacek

Top postsTop post

jacek

Message

238

Ω

66

4

4

4y

jacek

238

Ω

66

4y

Goodhart's Law in Reinforcement Learning

Produced As Part Of The OxAI Safety Labs program, mentored by Joar Skalse. TL;DR This is a blog post introducing our new paper, "Goodhart's Law in Reinforcement Learning" (to appear at ICLR 2024). We study Goodhart's law in RL empirically, provide a geometric explanation for why it occurs, and use...

Oct 16, 2023•126

Categorical-measure-theoretic approach to optimal policies tending to seek power

The paper Optimal Policies Tend to Seek Power tries to justify the claim in the title with a mathematical argument, using a clever formal definition of "power". I like the general approach, but I think that parts of the formalisation do not correspond well to the intuitive understanding of the...

Jan 12, 2023•31