36 Vingean Agency

24th Aug 2022

3 min read

36

I've been involved with several discussions about different notions of agency (and their importance/relationships) lately, especially with the PIBBSS group including myself, Daniel, Josiah, and Ramana; see here.

There's one notion of agency (not necessarily "The" notion of agency, but a coherent and significant notion) which vanishes if you examine it too closely.

Imagine that Alice is "smarter than Bob in every way" -- that is, Bob believes that Alice knows everything Bob knows, and possibly more. Bob doesn't necessarily agree with Alice's goals, but Bob expects Alice to pursue them effectively. In particular, Bob expects Alice's actions to be at least as effective as the best plan Bob can think of.

Because Bob can't predict what Alice will do, the only way Bob can further constrain his expectations is to figure out what's good/bad for Alice's objectives. In some sense this seems like a best-case for Bob modeling Alice as an agent: Bob understands Alice purely by understanding her as a goal-seeking force.

I'll call this Vingean agency, since Vinge talked about the difficulty of predicting agents who are smarter than you. and since this usage is consistent with other uses of the term "Vingean" in relation to decision theory.

However Vingean agency might seem hard to reconcile with other notions of agency. We typically think of "modeling X as an agent" as involving attribution of beliefs to X, not just goals. Agents have probabilities and utilities.

Bob has minimal use for attributing beliefs to Alice, because Bob doesn't think Alice is mistaken about anything -- the best he can do is to use his own beliefs as a proxy, and try to figure out what Alice will do based on that.^[1]

When I say Vingean agency "disappears when we look at it too closely", I mean that if Bob becomes smarter than Alice (understands more about the world, or has a greater ability to calculate the consequences of his beliefs), Alice's Vingean agency will vanish.

We can imagine a spectrum. At one extreme is an Alice who knows everything Bob knows and more, like we've been considering so far. At the other extreme is an Alice whose behavior is so simple that Bob can predict it completely. In between these two extremes are Alices who know some things that Bob doesn't know, while also lacking some information which Bob has.

(Arguably, Eliezer's notion of optimization power is one formalization of Vingean agency, while Alex Flint's attraction-basin notion of optimization defines a notion of agency at the opposite extreme of the spectrum, where we know everything about the whole system and can predict its trajectories through time.)

I think this spectrum may be important to keep in mind when modeling different notions of agency. Sometimes we analyze agents from a logically omniscient perspective. In representation theorems (such as Savage or Jeffrey-Bolker, or their lesser sibling, VNM) we tend to take on a perspective where we can predict all the decisions of an agent (including hypothetical decisions which the agent will never face in reality). From this omniscient perspective, we then seek to represent the agent's behavior by ascribing it beliefs and real-valued preferences (ie, probabilities and expected utilities).

However, this omniscient perspective eliminates Vingean agency from the picture. Thus, we might lose contact with one of the important pieces of the "agent" phenomenon, which can only be understood from a more bounded perspective.^[2]

^{^}
On the other hand, if Bob knows Alice wants cheese, then as soon as Alice starts moving in a given direction, Bob might usefully conclude "Alice probably thinks cheese is in that direction". So modeling Alice as having beliefs is certainly not useless for Bob. Still, because Bob thinks Alice knows better about everything, Bob's estimate of Alice's beliefs always matches Bob's estimate of his own beliefs, in expectation. So in that sense, Bob doesn't need to track Alice's beliefs separately from his own. When Alice turns left, Bob can simply conclude "so there's probably cheese in that direction" rather than tracking his and Alice's beliefs separately.
^{^}
I also think it's possible that Vingean agency can be extended to be "the" definition of agency, if we think that agency is just Vingean agency from some perspective. For example, ants have minimal Vingean agency from my perspective, because I already understand how they find the food in my house. However, I can easily inhabit a more naïve perspective in which this unexplained. Indeed, it's computationally efficient for me to model ants this way most of the time -- ants simply find the food. It doesn't matter how they do it.

AgencyOptimizationPIBBSSAI

Frontpage

Mentioned in

New Comment

9 comments, sorted by

top scoring

Click to highlight new comments since: Today at 11:11 PM

[-]Richard_Ngo3y63

Interesting post! Two quick comments:

Sometimes we analyze agents from a logically omniscient perspective. ... However, this omniscient perspective eliminates Vingean agency from the picture.

Another example of this happening comes when thinking about utilitarian morality, which by default doesn't treat other agents as moral actors (as I discuss here).

Bob has minimal use for attributing beliefs to Alice, because Bob doesn't think Alice is mistaken about anything -- the best he can do is to use his own beliefs as a proxy, and try to figure out what Alice will do based on that.

This makes sense when you think in terms of isolated beliefs, but less sense when you think in terms of overarching world-models/worldviews. Bob may know many specific facts about what Alice believes, but be unable to tie those together into a coherent worldview, or understand how they're consistent with his other beliefs. So the best strategy for predicting when Bob is a bounded agent may be:

Maintain a model of Alice's beliefs which contains the specific things Alice is known to believe, and use that to predict Alice's actions in domains closely related to those beliefs.
For anything which isn't directly implied by Alice's known beliefs, use Bob's own world-model to make predictions about what will achieve Alice's goals.

[-]abramdemski3y20

Another example of this happening comes when thinking about utilitarian morality, which by default doesn't treat other agents as moral actors (as I discuss here).

Interesting point!

Maintain a model of Alice's beliefs which contains the specific things Alice is known to believe, and use that to predict Alice's actions in domains closely related to those beliefs.

It sounds to me like you're thinking of cases on my spectrum, somewhere between Alice>Bob and Bob>Alice. If Bob thinks Alice knows strictly more than Bob, then Bob can just use Bob's own beliefs, even when specific-things-bob-knows-Alice-believes are relevant -- because Bob also already believes those things, by hypothesis. So it's only in intermediate cases that Bob might get a benefit from a split strategy like the one you describe.

[-]Richard_Ngo3y31

No, I'm thinking of cases where Alice>Bob, and trying to gesture towards the distinction between "Bob knows that Alice believes X" and "Bob can use X to make predictions".

For example, suppose that Bob is a mediocre physicist and Alice just invented general relativity. Bob knows that Alice believes that time and space are relative, but has no idea what that means. So when trying to make predictions about physical events, Bob should still use Newtonian physics, even when those calculations require assumptions that contradict Alice's known beliefs.

[-]abramdemski3y3-1

I think Bob still doesn't really need a two-part strategy in this case. Bob knows that Alice believes "time and space are relative", so Bob believes this proposition, even though Bob doesn't know the meaning of it. Bob doesn't need any special-case rule to predict Alice. The best thing Bob can do in this case still seems like, predict Alice based off of Bob's own beliefs.

(Perhaps you are arguing that Bob can't believe something without knowing what that thing means? But to me this requires bringing in extra complexity which we don't know how to handle anyway, since we don't have a bayesian definition of "definition" to distinguish "Bob thinks X is true but doesn't know what X means" from a mere "Bob thinks X is true".)

A similar example would be an auto mechanic. You expect the mechanic to do things like pop the hood, get underneath the vehicle, grab a wrench, etc. However, you cannot predict which specific actions are useful for a given situation.

We could try to use a two-part model as you suggest, where we (1) maintain an incoherent-but-useful model of car-specific beliefs mechanics have, such as "wrenches are often needed"; (2) use the best of our own beliefs where that model doesn't apply.

However, this doesn't seem like it's ever really necessary or like it saves processing power for bounded reasoners, because we also believe that "wrenches are sometimes useful". This belief isn't specific enough that we could reproduce the mechanic's actions by acting on these beliefs; but, that's fine, that's just because we don't know enough.

(Perhaps you have in mind a picture where we can't let incoherent beliefs into our world-model -- our limited understanding of Alice's physics, or of the mechanic's work, means that we want to maintain a separate, fully coherent world-model, and apply our limited understanding of expert knowledge only as a patch. If this is what you are getting at, this seems reasonable, so long as we can still count the whole resulting thing "my beliefs" -- my beliefs, as a bounded agent, aren't required to be one big coherent model.)

But, it does seem like there might be an example close to the one you spelled out. Perhaps when Alice says "X is relative", Alice often starts doing an unfamiliar sort of math on the whiteboard. Bob has no idea how to interpret any of it as propositions -- he can't even properly divide it up into equations, to pay lip service to equations in the "X is true, but I don't know what it means" sense I used above.

Then, it seems like Bob has to model Alice with a special-case "Alice starts writing the crazy math" model. Bob has some very basic beliefs about the math Alice is writing, such as "writing the letter Beta seems to be involved", but these are clearly object-level beliefs about Alice's behaviors, which Bob has to keep track of specifically. So in this situation it seems like Bob's best model of Alice's behavior doesn't just follow from Bob's own best model of what to do?

(So I end this comment on a somewhat uncertain note)

[-]Vanessa Kosoy3y52

The spectrum you're describing is related, I think, to the spectrum that appears in the AIT definition of agency where there is dependence on the cost of computational resources. This means that the same system can appear agentic from a resource-scarce perspective but non-agentic from a resource-abundant perspective. The former then corresponds to the Vingean regime and the latter to the predictable regime. However, the framework does have a notion of prior and not just utility, so it is possible to ascribe beliefs to Vingean agents. I think it makes sense: the beliefs of another agent can predictably differ from your own beliefs if only because there is some evidence that you have seen but the other agent, to the best of your knowledge, have not^[1].

You need to allow for the possibility that the other agent inferred this evidence from some pattern you are not aware of, but you should not be confident of this. For example even a an arbitrarily-intelligent AI that received zero external information should have a hard time inferring certain things about the world that we know. ↩︎

[-]Emrik3y51

I'm confused. (As in, actually confused. The following should hopefwly point at what pieces I'm missing in order to understand what you mean by a "problem" for the notion.)

Vingean agency "disappears when we look at it too closely"

I don't really get why this would be a problem. I mean, "agency" is an abstraction, and every abstraction becomes predictably useless once you can compute the lower layer perfectly, at least if you assume compute is cheap. Balloons!

Imagine you've never seen a helium balloon before, and you see it slowly soaring to the sky. You could have predicted this by using a few abstractions like density of gases and Archimedes' principle. Alternatively, if you had the resources, you could make the identical prediction (with inconsequentially higher precision) by extrapolating from the velocities and weights of all the individual molecules, and computed that the sum of forces acting on the bottom of the balloon exceeds the sum acting on the top. I don't see how the latter being theoretically possible implies a "problem" for abstractions like "density" and "Archimedes' principle".

[-]abramdemski3y52

I think the main problem is that expected utility theory is in many ways our most well-developed framework for understanding agency, but, makes no empirical predictions, and in particular does not tie agency to other important notions of optimization we can come up with (and which, in fact, seem like they should be closely tied to agency).

I'm identifying one possible source of this disconnect.

The problem feels similar to trying to understand physical entropy without any uncertainty. So it's like, we understand balloons at the atomic level, but we notice that how inflated they are seems to depend on the temperature of the air, but temperature is totally divorced from the atomic level (because we can't understand entropy and thermodynamics without using any notion of uncertainty). So we have this concept of balloons and this separate concept of inflatedness, which really really should relate to each other, but we can't bridge the gap because we're not thinking about uncertainty in the right way.

[-]Roman Leventov3y51

My current favourite notion of agency, primarily based on Active Inference, which I refined upon reading "Discovering Agents", is the following:

Agency is a property of a physical system from some observer’s subjective perspective. It stems from the observer’s generative model of the world (including the object in question), specifically whether the observer predicts the agent's future trajectory in the state space by assuming that the agent has its own generative model which the agent uses to act. The agent's own generative model also depends on (adapts to, is learned from, etc.) the agent's environment. This last bit comes from "Discovering Agents".

"Having own generative model" is the shakiest part. It probably means that storage, computation, and maintenance (updates, learning) of the model all happen within the agent's boundaries: if not, the agent's boundaries shall be widened, as in the example of "thermostat with its creation process" from "Discovering Agents". The storage and computational substrate of the agent's generative model is not important: it could be neuronal, digital, chemical, etc.

Now, the observer models the generative model inside the agent. Here's where this Vingean veil comes from: if the observer has perfect observability of the agent's internals, then it is possible to believe that your model of the agent exactly matches the agent's own generative model, but usually, it will be less than perfect, due to limited observability.

However, even perfect observability doesn't guarantee safety: the generative model might be large and effectively incompressible (the halting problem), so the only way to see what it will do may be to execute it.

The theory of mind is a closely related idea to all of the above, too.

[-]abramdemski3y51

The agent's own generative model also depends on (adapts to, is learned from, etc.) the agent's environment. This last bit comes from "Discovering Agents".
"Having own generative model" is the shakiest part.

What it means for the agent to "have a generative model" is that the agent systematically corrects this model based on its experience (to within some tolerable competence!).

It probably means that storage, computation, and maintenance (updates, learning) of the model all happen within the agent's boundaries: if not, the agent's boundaries shall be widened,

A model/belief/representation depends on reference maintenance, but in general, the machinery of reference maintenance can and usually should extend far beyond the representation itself.

For example, an important book will tend to get edition updates, but the complex machinery which results in such an update extends far beyond the book's author.

A telescope produces a representation of far-away space, but the empty space between the telescope and the stars is also instrumental in maintaining the reference (eg, it must remain clear of obstacles).

A student does a lot of work "within their own boundaries" to maintain their knowledge, but they also use notebooks, computers, etc. The student's teachers are also heavily involved in the reference-maintenance.

My current favourite notion of agency, primarily based on Active Inference,

I'm not a big fan of active inference. It strikes me as, basically, a not-particularly-great scheme for injecting randomness into actions to encourage exploration.

Moderation Log

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

36

Vingean Agency

36