x

AI ALIGNMENT FORUM

AF

eric_langlois — AI Alignment Forum

eric_langlois

eric_langlois

Message

63

Ω

17

1

3

8y

eric_langlois

63

Ω

17

8y

Bounding Goodhart's Law

Goodhart's law seems to suggest that errors in utility or reward function specification are necessarily bad in sense that an optimal policy for the incorrect reward function would result in low return according to the true reward. But how strong is this effect? Suppose the reward function were only slightly...

Jul 11, 2018•43