Attainable Utility Landscape: How The World Is Changed

Going to the green state means you can't get to the purple state as quickly.

On a deep level, why is the world structured such that this happens? Could you imagine a world without opportunity cost of any kind?

In a complete graph, all nodes are directly connected.

Equivalently, we assumed the agent isn't infinitely farsighted (γ<1); if it were, it would be possible to be in "more than one place at the same time", in a sense (thanks to Rohin Shah for this interpretation).

The opposite of this, is that if it were possible for an agent to be in more than one place at the same time, they could be infinitely farsighted. (Possibly as a consequence of FTL.)

[-]TurnTrout6y20

In a complete graph, all nodes are directly connected.

Surprisingly, unless you're talking about $K_{1}$ (complete 1-graph), opportunity cost still exists in $K_{n}$ ( $n > 1$ ). Each round, you choose where to go next (and you can go to any state immediately). Going to one state next round means you can't go to a different state next round, so for any given action there exists a reward function which incurs opportunity cost.

Definition. We say opportunity cost exists at a state $s$ if there exist child states $s_{1}, s_{2}$ of state $s$ such that $V_{R}^{*} (s_{1}) \neq V_{R}^{*} (s_{2})$ for some reward function $R$ . That is, $s$ has successor states with different (optimal) AUs for some reward function.

The opposite of this, is that if it were possible for an agent to be in more than one place at the same time, they could be infinitely farsighted. (Possibly as a consequence of FTL.)

Things get weird here, depending on your theory of identity and how that factors into the planning / reward process? Can you spell this out some more?

[-]Rafael Harth5y10

The technical appendix felt like it was more difficult than previous posts, but I had the advantage of having tried to read the paper from the preceding post yesterday and managed to reconstruct the graph & gamma correctly.

The early part is slightly confusing, though. I thought AU is a thing that belongs to the goal of an agent, but the picture made it look as if it's part of the object ("how fertile is the soil?"). Is the idea here that the soil-AU is slang for "AU of goal 'plant stuff here'"?

I did interpret the first exercise as "you planned to go onto the moon" and came up with stuff like "how valuable are the stones I can take home" and "how pleasant will it be to hang around."

One thing I noticed is that the formal policies don't allow for all possible "strategies." In the graph we had to reconstruct, I can't start at s1, then go to s1 once and then go to s3. So you could think of the larger set $Π_{L}$ where the policies are allowed to depend on the time step. But I assume there's no point unless the reward function also depends on the time step. (I don't know anything about MDPs.)

Am I correct that a deterministic transition function is a function $T : S \times A \to S$ and a non-deterministic one is a function $T : S \times A \times S \to [0, 1]$ ?

[-]TurnTrout5y20

Is the idea here that the soil-AU is slang for "AU of goal 'plant stuff here'"?

yes

One thing I noticed is that the formal policies don't allow for all possible "strategies."

yeah, this is because those are “nonstationary” policies - you change your mind about what to do at a given state. A classic result in MDP theory is that you never need these policies to find an optimal policy.

Am I correct that a deterministic transition function is

yup!

[-]Stuart_Armstrong6y10

I find the existing MDP isomorphisms/equivalences to be pretty lacking.

I have a paper on equivalences (and counterfactual equivalences, which is stronger) for POMDPs: https://arxiv.org/abs/1801.03737

The possibility isomorphism is new to my work, as are all other results shared in this post. This apparent lack of basic theory regarding MDPs is strange; even stranger, this absence was actually pointed out in two published papers!

I find the existing MDP isomorphisms/equivalences to be pretty lacking. The details don't fit in this margin, but perhaps in a paper at some point. If you want to coauthor this (mainly compiling results, finding a venue, and responding to reviews), let me know and I can share what I have so far (extending well beyond the theorems in my recent work on power). ↩︎
In fact, you can reconstruct the environment using only a limited subset of possibilities: the non-dominated possibilities. ↩︎
As a tensor, the transition function $T$ has size $| A | \cdot | S |^{2}$ , while the AU landscape representation only has size $| S |^{2}$ . However, if you're just representing $T$ as a transition function, it has size $| A | \cdot | S |$ . ↩︎

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

22

Attainable Utility Landscape: How The World Is Changed

22

AU landscape as a unifying frame

Technical appendix: AU landscape and world state contain equal information

Possibility isomorphism

Representation equivalence

Technical appendix: Opportunity cost

Notes