Optimization at a Distance

[-]Vladimir_Nesov4y*110

Embedded agents have a spatial extent. If we use the analogy between physical spacetime and a domain of computation of environment, this offers interesting interpretations for some terms.

In a domain, counterfactuals might be seen as points/events/observations that are incomparable in specialization order, that is points that are not in each other's logical future. Via the spacetime analogy, this is the same as the points being space-like separated. This motivates calling collections of mutually counterfactual (incomparable) events logical space, in the same sense as events comparable in specialization order follow logical time. (Some other non-Frechet spaces would likely give more interesting space-like subspaces than a domain typical for program semantics.)

An embedded agent extant in logical space of an environment (at a particular time) is then a collection of counterfactuals. In this view, an agent is not a specific computation, but rather a collection of possible alternative behaviors/observations/events of an environment (resulting from multiple different computations), events that are counterfactual to each other. The logical space an agent occupies comprises the behaviors/observations/events (partial-states-at-a-time) of possible environments where the agent has influence.

In this view, counterfactuals are not merely phantasmal decision theory ideas developed to make sure that reality doesn't look like them, hypothetical threats that should never obtain in actuality. Instead, they are reified as equals to reality, as parts of the agent, and an agent's description is incomplete without them. This is not as obvious as with parts of a physical machine because usually each small part of a machine doesn't contain a precise description of the whole machine. With agents, an actual agent suggests quite strongly what its counterfactual behaviors would be in the adjacent possible environments, at least given a decision theory that interprets such things. So this resembles a biological organism where each cell has a blueprint for the whole body, each expression of counterfactual behavior of an embedded agent has the whole design of the agent sufficient to reconstruct its behavior in the other counterfactuals. But this point of view suggests that this is not a necessary property of embedded agents, that counterfactuals might have independent content, other parts of a larger design.

For counterfactuals in decision theory, this cashes out as imperfect ability of an agent to know what it does in counterfactuals, or as coordination with other agents that have different designs in different counterfactuals, acausal trade across logical space. So there is essentially nothing new, the notion of "logical space" and of agents having extent in logical space adds up to normality, extending the title of a singular "agent" to a collective of agents with different designs that are mutually counterfactual and are engaged in acausal trade with each other, parts of the collective. It is natural to treat different parties engaged in acausal trade as parts of a whole since they interact and influence each other's behavior. With sufficient integration, it becomes more central to call the whole collective "an agent" instead of privileging views that only focus on one part (counterfactual) at a time.

Logical space is an unusual notion of counterfactuals, because different points of a logical space can have a common logical future, that is different counterfactuals can contribute to the same future logical event, be in that event's past. This is not surprising given acausal trade and predictors that ask what a given agent/computation does in multiple counterfactual situations. But it usefully runs counter to the impression that counterfactuals necessarily irrevocably diverge from each other, embed a mutual contradiction that prevents them from ever being reunited in a single possibility.

[-]Bird Concept4y100

If someone asks what the rock is optimizing, I’ll say “the actions” - i.e. the rock “wants” to do whatever it is that the rock in fact does.

This argument does not seem to me like it captures the reason a rock is not an optimiser?

I would hand wave and say something like:

"If you place a human into a messy room, you'll sometimes find that the room is cleaner afterwards. If you place a kid in front of a bowl of sweets, you'll soon find the sweets gone. These and other examples are pretty surprising state transitions, that would be highly unlikely in the absence of those humans you added. And when we say that something is an optimiser, we mean that it is such that, when it interfaces with other systems, it tends to make a certain narrow slice of state space much more likely for those systems to end up in."

The rock seems to me to have very few such effects. The probability of state transitions of my room is roughly the same with or with out a rock in a corner of it. And that's why I don't think of it as an optimiser.

[-]johnswentworth4y30

Exactly! That's an optimization-at-a-distance style intuition. The optimizer (e.g. human) optimizes things outside of itself, at some distance from itself.

A rock can arguably be interpreted as optimizing itself, but that's not an interesting kind of "optimization", and the rock doesn't optimize anything outside itself. Throw it in a room, the room stays basically the same.

[-]adamShimi4y40

Great post!

For instance, if I’m planning a party, then the actions I take now are far away in time (and probably also space) from the party they’re optimizing. The “intermediate layers” might be snapshots of the universe-state at each time between the actions and the party. (... or they might be something else; there are usually many different ways to draw intermediate layers between far-apart things.)
This applies surprisingly well even in situations like reinforcement learning, where we don’t typically think of the objective as “far away” from the agent. If I'm a reinforcement learner optimizing for some reward I’ll receive later, that later reward is still typically far away from my current actions. My actions impact the reward via some complicated causal path through the environment, acting through many intermediate layers.
So we’ve ruled out agents just “optimizing” their own actions. How does this solve the other two problems?

I feel like this is assuming away one of the crucial difficulties of ascribing agency and goal-directedness: lack of competence or non optimality might make agentic behavior look non-agentic unless you already have a mechanistic interpretation. Separating a rock from a human is not really the problem; it's more like separating something acting like a chimp but for which you have very little data and understanding, and an agent optimizing to clip you.

(Not saying that this can't be relevant to address this problem, just that currently you seem to assume the problem away)

Because the agent only interacts with the far away things-it’s-optimizing via a relatively-small summary, it’s natural to define the “actions” and “observations” as the contents of the summary flowing in either direction, rather than all the low-level interactions flowing through the agent’s supposed “Cartesian boundary”. That solves the microscopic interactions problem: all the random bumping between my hair/skin and air molecules mostly doesn’t impact things far away, except via a few summary variables like temperature and pressure.

Hmm. I like the idea of redefining action as the consequences of one's action that are observable "far away" — it nicely rederives the observation-action loop through interaction with far away variables. That being said, I'm confused if defining the observations in the summary statistics itself is not problematic. I have one intuition that tells me that this is all you can observe anyway, so it's fine; on the other hand, it looks like you're assuming that the agent has the right ontology already? I guess that can be solved by saying that the observations are on the content of the summary, but not necessarily all of it.

When Adam Shimi first suggested to me a couple years ago that “optimization far away” might be important somehow, one counterargument I raised was dynamic programming (DP): if the agent is optimizing an expected utility function over something far away, then we can use DP to propagate the expected utility function back through the intermediate layers to find an equivalent utility function over the agent’s actions:
This isn’t actually a problem, though. It says that optimization far away is equivalent to some optimization nearby. But the reverse does not necessarily hold: optimization nearby is not necessarily equivalent to some optimization far away. This makes sense: optimization nearby is a trivial condition which matches basically any system, and therefore will match the interesting cases as well as the uninteresting cases.

I think I actually remember now the discussion we were having, and I recall an intuition about counting. Like, there seem to be more ways to optimize nearby than to optimize the specific part of far away, which I guess is what you're pointing at.

[-]Gordon Seidoh Worley4y30

Really liking this model. It seems to actually deal with the problem of embeddedness for agents and the fact that there is no clear boundary to draw around what we call an agent other than one that's convenient for some purpose.

I've obviously got thoughts on how this is operationalizing insights about "no-self" and dependent origination, but that doesn't seem too important to get into, other than to say it gives me more reason to think this is likely to be useful.

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

48

48

Microscopic Interactions

Flexible Boundaries

An Agent Optimizing Its Own Actions

Solution: Optimization At A Distance

Abstract Summaries

Aside: Dynamic Programming

Mental Picture