This paper gives a mathematical model of when Goodharting will occur. To summarize: if
(1) a human has some collection of things which she values,
(2) a robot has access to a proxy utility function which takes into account some strict subset of those things, and
(3) the robot can freely vary how much of there are in the world, subject only to resource constraints that make the trade off against each other,
then when the robot optimizes for its proxy utility, it will minimize all 's which its proxy utility... (read more)
It seems to me that the meaning of the set of cases drifts significantly from when it is first introduced and the "Implications" section. It further seems to me that clarifying what exactly is supposed to be resolves the claimed tension between the existence of iterably improvable ontology identifiers and difficulty of learning human concept boundaries.
Initially, is taken to be a set of cases such that the question has an objective, unambiguous answer. Cases where the meaning of are ambiguous are ... (read more)
Hmm, I'm not sure I understand -- it doesn't seem to me like noisy observations ought to pose a big problem to control systems in general.
For example, suppose we want to minimize the number of mosquitos in the U.S., and we access to noisy estimates of mosquito counts in each county. This may result in us allocating resources slightly inefficiently (e.g. overspending resources on counties that have fewer mosquitos than we think), but we'll still always be doing the approximately correct thing and mosquito counts will go down. In particular, I don't see a se... (read more)