All of Sam Marks's Comments + Replies

G Gordon Worley III's Shortform

Hmm, I'm not sure I understand -- it doesn't seem to me like noisy observations ought to pose a big problem to control systems in general.

For example, suppose we want to minimize the number of mosquitos in the U.S., and we access to noisy estimates of mosquito counts in each county. This may result in us allocating resources slightly inefficiently (e.g. overspending resources on counties that have fewer mosquitos than we think), but we'll still always be doing the approximately correct thing and mosquito counts will go down. In particular, I don't see a se... (read more)

1G Gordon Worley III2d
"Error" here is all sources of error, not just error in the measurement equipment. So bribing surveyors is a kind of error in my model.
G Gordon Worley III's Shortform

This paper gives a mathematical model of when Goodharting will occur. To summarize: if

(1) a human has some collection  of things which she values,

(2) a robot has access to a proxy utility function which takes into account some strict subset of those things, and

(3) the robot can freely vary how much of  there are in the world, subject only to resource constraints that make the  trade off against each other,

then when the robot optimizes for its proxy utility, it will minimize all 's which its proxy utility... (read more)

1G Gordon Worley III2d
I actually don't think that model is general enough. Like, I think Goodharting is just a fact of control system's observing. Suppose we have a simple control system with outputXand a governorG.Gtakes a measurementm(X)(an observation) ofX. So long asm(X)is not error free (and I think we can agree that no real world system can be actually error free), thenX= m(X)+ϵfor some error factorϵ. SinceGusesm(X)to regulate the system to changeX, we now have error influencing the value ofX. Now applying the standard reasoning for Goodhart, in the limit of optimization pressure (i.e.Gregulating the value ofXfor long enough),ϵcomes to dominate the value ofX. This is a bit handwavy, but I'm pretty sure it's true, which means in theory any attempt to optimize for anything will, under enough optimization pressure, become dominated by error, whether that's human values or something else. The only interesting question is can we control the error enough, either through better measurement or less optimization pressure, such that we can get enough signal to be happy with the output.
Implications of automated ontology identification

It seems to me that the meaning of the set  of cases drifts significantly from when it is first introduced and the "Implications" section. It further seems to me that clarifying what exactly  is supposed to be resolves the claimed tension between the existence of iterably improvable ontology identifiers and difficulty of learning human concept boundaries.

Initially,  is taken to be a set of cases such that the question  has an objective, unambiguous answer. Cases where the meaning of  are ambiguous are ... (read more)