One related thing I was thinking about last week: part of the idea of abstraction is that we can pick a Markov blanket around some variable X, and anything outside that Markov blanket can only "see" abstract summary information f(X). So, if we have a goal which only cares about things outside that Markov blanket, then that goal will only care about f(X) rather than all of X. This holds for any goal which only cares about things outside the blanket. That sounds like instrumental convergence: any goal which does not explicitly care about things near X itself, will care only about controlling f(X), not all of X.

This isn't quite the same notion of goal-locality that the OP is using (it's not about how close the goal-variables are to the agent), but it feels like there's some overlapping ideas there.

Reply

[-]adamShimi5y20

The more I think about it, the more I come to believe that locality is very related to abstraction. Not the distance part necessarily, but the underlying intuition. If my goal is not "about the world", then I can throw almost all information about the world except a few details and still be able to check my goal. The "world" of the thermostat is in that sense a very abstracted map of the world where anything except the number on its sensor is thrown away.

Reply

[-]adamShimi5y*20

Thanks! Glad that I managed to write something that was not causally or rhetorically all wrong. ^^

One related thing I was thinking about last week: part of the idea of abstraction is that we can pick a Markov blanket around some variable X, and anything outside that Markov blanket can only "see" abstract summary information f(X). So, if we have a goal which only cares about things outside that Markov blanket, then that goal will only care about f(X) rather than all of X

That makes even more sense to me than you might think. My intuitions about locality comes from its uses in distributed computing, where it measures both how many rounds of communication are needed to solve a problem and how far in the communication graph one needs to look to compute one's own output. This looks like my use of locality here.

On the other hand, recent work on distributed complexity also studied the volume complexity of a problem: the size of the subgraph one needs to look at, which might be very different from a ball. The only real constraint is connectedness. Modulo the usual "exactness issue", which we can deal with by replacing "the node is not used" by "only f(X) is used", this looks a lot like your idea.

Reply

[-]Rohin Shah5y40

Planned summary for the Alignment Newsletter:

This post introduces the concept of the _locality_ of a goal, that is, how “far” away the target of the goal is. For example, a thermometer’s “goal” is very local: it “wants” to regulate the temperature of this room, and doesn’t “care” about the temperature of the neighboring house. In contrast, a paperclip maximizer has extremely nonlocal goals, as it “cares” about paperclips anywhere in the universe. We can also consider whether the goal depends on the agent’s internals, its input, its output, and/or the environment.

The concept is useful because for extremely local goals (usually goals about the internals or the input) we would expect wireheading or tampering, whereas for extremely nonlocal goals, we would instead expect convergent instrumental subgoals like resource acquisition.

Reply

[-]adamShimi5y10

Thanks for the summary! It's representative of the idea.

Just by curiosity, how do you decide for which posts/paper you want to write an opinion?

Reply

[-]Rohin Shah5y20

I ask myself if there's anything in particular I want to say about the post / paper that the author(s) didn't say, with an emphasis on ensuring that the opinion has content. If yes, then I write it.

(Sorry, that's not very informative, but I don't really have a system for it.)

Reply