Intertheoretic utility comparison: outcomes, strategies and utilities

Stuart_Armstrong

This post seeks to clarify the question of what utility functions should be defined over, in the situation where an agent has uncertainty over their own utility function.

Outcomes or worlds

The most intuitively natural thing to define utility functions over is outcomes or worlds. The problem with this approach is that it results in an excessive amount of utility functions.

For instance, imagine the agent has two actions $a_{1}$ and $a_{2}$ . The action $a_{1}$ will map straight to world $w_{1}$ , while $a_{2}$ will map to a complicated probability distribution over a large set of worlds $W_{2}$ . The agent gets no further action.

If we see the set of worlds as $W = {w_{1}} \cup W_{2}$ , then we have a vast amount of possible utility functions. Whereas in practice only two parts of the utility function matters: its value on $w_{1}$ , and its expected value on $W_{2}$ .

An agent that knows those values (technically, that knows the ordering of those values) for a single utility function, can reach a decision. Therefore it would seem perverse to have things change when the agent starts to have uncertainty over its utility.

What would make sense is to have utilities defined over expected outcomes -- ie over ${w_{1}}$ and $W_{2}$ -- but that definition doesn't seem to work in more general situations.

Histories

Model the agent as interacting $n$ times with the world. It gets an observation, then follows that by an action until it takes its last action $a_{n}$ . Call the set of $n$ alternating observations and actions a history, and let $H$ be the set of histories. We can also define partial histories, sequences $h = o_{1} a_{1} o_{2} a_{2} \dots o_{m}$ of length $m < n$ .

Note this inverts the usual order, by starting with observations and following with actions. This is because the history ends with an action: since the agent can't affect anything more after than, subsequent observations are meaningless. Partial histories, though, end with observations, not actions (this is for subsequent ease of use).

Any utility function $u$ can be seen as function from $H$ to $R$ , by mapping the history $h_{n} \in H$ to $u (h_{n})$ , the expected utility of the history.

So a more natural class would be to define utilities as being over $H$ , the set of possible histories. Note that actions are included in histories, and histories with the same observations but different actions can have different utility values. This is because, eg, the observation of someone saying 'You succeed in your task!' will have different implications depending on the action just prior.

So define $U$ as the set of utility functions (not equivalence classes of utility functions) over $H$ .

This is better that defining $U$ over outcomes, but still somewhat unsatisfactory, as we can have a large number of incredibly unlikely histories, and these should not affect the normalisation procedure much.

So what is really wanted is a distribution over $H$ . But that makes no sense, as we don't know what the actions will be. So we need to look at how the agent choose its actions: at its set of strategies.

Strategies

The set of deterministic strategies $S$ is the set of ways an agent can pick its actions, given its past history. It would be tempting to define utilities as taking values over $S$ , but there a subtleties to deal with first.

Isomorphic strategies

Suppose the agent can choose action $a_{1}$ or $b_{1}$ at the first step, then $a_{2}$ or $b_{2}$ at the second step. The observations are $o_{1}$ and $o_{2}$ , independently of actions chosen.

Then consider the following strategies:

$s (o_{1}) = a_{1}$ , $s (o_{1} a_{1} o_{2}) = a_{2}$ , $s (o_{1} a_{2} o_{2}) = a_{2}$ .
$s^{'} (o_{1}) = a_{1}$ , $s^{'} (o_{1} a_{1} o_{2}) = a_{2}$ , $s^{'} (o_{1} a_{2} o_{2}) = b_{2}$ .

These strategies seem to differ: they give different behaviours on the partial history $h = o_{1} a_{2} o_{2}$ . However, because both will choose $a_{1}$ after $o_{1}$ , the history $h$ will never come up. Thus we will define $S$ as consisting of equivalent classes of strategies that will always choose the same action in any situation that actually arises.

Stochastic strategies

We should briefly note that we can form stochastic strategies by setting a probability distribution over $S$ (this defines a simplex with 'pure' elements of $S$ as the vertexes). This set includes all possible stochastic strategies.

Utilities and strategies

This section presents a ways of connecting strategies and utilities. Starting with embedding utilities over histories into utilities over strategies:

Theorem 1: There is a linear map $k$ from $U$ to $R^{S}$ . This $k$ need not be either injective nor surjective.

Proof: The set $R^{S}$ is naturally identified with the set of functions $S \to R$ .

Each element of $s \in S$ defines a probability distribution on $H$ , and hence an expectation for any utility function $u$ . Write this expectation as $u (s)$ . Then the values of $u (s)$ for different $s$ defines the element $k (u) \in R^{S}$ .

The linearity of $k$ comes from the fact that utility functions are additive: $(u + v) (s) = u (s) + v (s)$ .

To show that $k$ need not be injective, consider the following situation, where the initial observation is $o$ or $o^{'}$ with equal probability, leading to a choice of actions ${a, b}$ and ${a^{'}}$ :

Then there are two strategies -- the ones that pick $a$ and $b$ -- but three worlds -- those ending in $a$ , $b$ , and $a^{'}$ -- so $k$ maps a three dimensional space into a two dimensional one, and cannot be injective.

There are utilities in $U$ that are independent of any strategy; the kernel of $k$ is the subspace of these that happen to always map to utility $0$ .

To show that $k$ need not be surjective, add actions $c$ and $b^{'}$ after $o$ and $o^{'}$ above:

Then the set of strategies has six elements (each strategy choose one element from among $3$ elements, then another from among $2$ elements), while the set of histories has five elements, corresponding to each of the five final actions. Hence $k$ cannot be surjective from the five dimensional $U$ to the six dimensional $R^{S}$ . $□$

Given this $k$ , we can consider the true space of utility functions that we're interested in: the image of $k$ , divided out by equivalence relations (translations and positive scalings). Thus the space of relevant utility classes is:

$U = k (U) / \sim$ .

Note that can translate any element of $R^{S}$ to have all its coordinates sum to zero, and then divide by positive scalings. This means that $U = S \cup {0}$ where $S$ is a sphere and ${0}$ corresponds to the utilities constant on all strategies. The topology of $U$ is the natural topology on $S$ , combined with the fact that the only open set containing ${0}$ is $U$ itself.

Stable permutations

We want to be able to interchange strategies, and still get the same result: have some sort of symmetry on the normalisation process. However, since $k (U)$ can be a strict subset of $R^{S}$ , we can't just take arbitrary permutations of $S$ . This section will instead define the subset of stable permutations. Note that any permutation of $S$ extends to an invertable linear transformation of $R^{S}$ by permutation of the basis elements.

Definition: Stable permutations. A stable permutation is a permutation of $S$ that extends to a linear map on $R^{S}$ that preserves $k (U) \subset R^{S}$ .

Since $U = k (U) / \sim$ , we can call the set of stable permutations $S_{U}$ .

How common are stable permutations? In general, not too uncommon and not too common, as the following two theorems will show:

Proposition 2: If $S$ is non-trivial, then so is $S_{U}$ .

Proof: Consider the set of actions and possible observations as a tree. If $S$ contains more than one element, then there is a branching point where the agent can choose between more than one action. Since the length of the branches is $n$ and hence finite, there exists at least one branching point with no subsequent branching point: the agent has no further decisions to make.

Let $h$ be the partial history that corresponds to that final branching point, and let $a$ and $b$ be possible actions at $h$ . Then a transposition of $a$ and $b$ will also permute any strategies $S$ for which $h$ is a possible history.

But because $h$ is a finale branching point, the expected utilities $u (h a)$ and $u (h b)$ are defined for any $u \in U$ , independently of any strategy. Thus transposing $a$ and $b$ defines a map $σ$ on $U$ , with $σ (u) (h a) = u (h b)$ , $σ (u) (h b) = u (h a)$ , and $u (h^{'})$ otherwise unchanged for any history $h$ that doesn't start with $h a$ or $h b$ .

Hence the permutation of $S$ corresponds to a linear automorphism of $U$ , which preserves the image of $U$ under $k$ , which means that that permutation is in $S_{U}$ . $□$

In general, any tree isomorphism will correspond to an element of $S_{U}$ , and one can delete -- or add -- nodes with a single action before doing such an isomorphism. However, in the general case, the space of stable permutations is not that large:

Proposition 3: The set $S_{U}$ can be a strict subset of the set of permutations of $S$ . And $S_{U}$ need not be transitive and may even have fixed points: there may exist an $s \in S$ which is mapped to itself for all $ρ \in S_{U}$ .

Proof: consider the situation below, where there are no choices after $o_{1} b$ :

Let $s$ be the strategy that chooses $b$ at $o_{1}$ , and let $u$ be the utility function that maps $o_{1} b$ to $1$ and everything else to $0$ . Then assume that $ρ$ exchanges $s$ with any $s^{'}$ . This means that $ρ (k (u))$ is a function in $R^{S}$ that takes value $1$ on a single policy $s^{'}$ that choose $a$ after $o_{1}$ (as all policies but $s$ do that), and $0$ everywhere else.

Without loss of generality, let $s^{'}$ choose $a$ and $a^{'}$ after $o_{2}$ and $o_{2}^{'}$ , respectively. Then changing $a$ to $b$ must reduce the expectation of $s^{'}$ by $1$ , and similarly changing $a^{'}$ to $b^{'}$ . Thus changing $a$ to $b$ and changing $a^{'}$ to $b^{'}$ simultaneously must, by linearity, set the expected utility of that policy to $- 1$ . Consequently, $ρ (k (u))$ cannot be an element of $k (U)$ , and hence $ρ \notin S_{U}$ .

Thus every element of $S_{U}$ must map $s$ to itself. $□$

Extra results and concepts

This section presents some miscellaneous extra results and concepts within this formalism.

Utilities and strategies

Each $s \in S$ generates $H_{s} \subset H$ which is the set of histories which have non-zero probability given $s$ .

Lemma 4: If $s \neq s^{'}$ , then $H_{s} \neq H_{s^{'}}$ .

Proof: since $s \neq s^{'}$ , there must be some partial history $h$ , of non-zero probability given $s$ or $s^{'}$ , where $s$ will choose action $a$ while $s^{'}$ will choose action $a^{'}$ (in the extreme case, $h$ may be $\emptyset$ ). Thus $s$ must have at least one possible history starting with $h a$ while $s^{'}$ can have none (and vice versa for $h a^{'}$ ). $□$

Proposition 5: Each $s$ has some $[u] \in U$ of which it is the unique maximising strategy for.

Proof: Define $u_{s} \in U$ as mapping $H_{s}$ to $1$ and the rest of $H$ to $0$ . Thus the expectation $u_{s} (s) = 1$ , and $u_{s} (s^{'}) < 1$ for $s \neq s^{'}$ , since $s^{'}$ has a non-zero probability of reaching a history outside $H_{s}$ , hence a history where $u_{s}$ has expected utility $0$ . The argument extends to mixed strategies, and the relevant utility class is $[k (u_{s})]$ . $□$

Partial histories

For any partial history $h$ (or $h a$ , if we want it to end with an action), define $H_{h}$ ( $H_{h a}$ ) as the set of histories that start with $h$ (or $h a$ ).

Let $S_{h}$ be the set of strategies on $H_{h}$ , and $S (h) \subset S$ the set of strategies that allow the history $h$ . Then there is a natural projection $q_{h} : S (h) \to S_{h}$ by mapping each strategy to its equivalence class of strategies on $H_{h}$ . The same holds for $h a$ .

Let $S_{- h a}$ be a set of partial strategies, defined as strategies defined on $H - H_{h a}$ that allow $h a$ to happen with non-zero probability. Then $S (h a) = S_{- h a} \times S_{h}$ .

Let $U_{h a}$ be the set of utility functions on $H_{h a}$ , and $k_{h a}$ the projection $U_{h a} \to S_{h a}$ from Theorem 1. We can define $U_{h a}$ as $k (U_{h a}) / \sim$ , similarly to above.

Let $O_{h a}$ be the set of possible observations the agent can make just after $h a$ . Then there is a decomposition $S_{h a} = \times_{o \in O_{h a}} S_{h a o}$ .

Proposition 6: There is a natural map $r_{h a}$ from $U$ to $U_{h a}$ .

Proof: for $u \in U$ , $s \in S_{h a}$ and $s^{'} \in S_{- h a}$ , define $r_{h a} (u) (s)$ as $u (s \times s^{'})$ .

Now replace $s^{'}$ with $s^{''}$ , also in $S_{- h a}$ , and compare, using standard conditional probability and conditional expected utility:

$u (s \times s^{'}) = P (h a | s \times s^{'}) u (s \times s^{'} | h a) + (1 - P (h a | s \times s^{'})) u (s \times s^{'} | \neg h a))$ .
$u (s \times s^{''}) = P (h a | s \times s^{''}) u (s \times s^{''} | h a) + (1 - P (h a | s \times s^{''})) u (s \times s^{''} | \neg h a))$ .

Note that the probability of $h a$ is independent of $s$ (since $s \in S_{h a}$ ), and, since $s^{'}$ and $s^{''}$ are deterministic, $P (h a | s^{'}) = P (h a | s^{''})$ . Similarly $u (s \times s^{'} | h a)$ and $u (s \times s^{'} | \neg h a)$ are independent of $s^{'}$ and and $s$ , respectively -- and the same goes for the $s^{''}$ expression.

Putting this all together gives:

$u (s \times s^{'}) - u (s \times s^{''}) = P (h a | s^{'}) (u (s | h a) - u (s | h a)) + (1 - P (h a | s^{'})) (u (s^{'} | \neg h a) - u (s^{'} | \neg h a)) = C_{s^{'}, s^{''}}$ ,

where $C_{s^{'}, s^{''}}$ is a constant depending only on $s^{'}$ and $s^{''}$ .

Consequently, the map $r_{h a}$ defined above changes by a constant when $s^{'}$ changes. Similarly, if we change $u$ by changing its definition on $H - H_{h a}$ only, then $r_{h a}$ also changes by a constant, as $u (s^{'} | \neg h a)$ changes.

Thus, up to constants, $r_{h a}$ is a well defined map from $U$ to $k_{h a} (U_{h a})$ . Since it only uses the expectation of $u$ on $S$ , this makes it a well defined map from $k (U)$ to $k_{h a} (U_{h a})$ .

Then if we divide by translations (and scalings), the constant factors drop out, so $r_{h a}$ is a well defined map from $U$ to $U_{h a}$ . $□$

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

0

Intertheoretic utility comparison: outcomes, strategies and utilities

0

Outcomes or worlds

Histories

Strategies

Isomorphic strategies

Stochastic strategies

Utilities and strategies

Stable permutations

Extra results and concepts

Utilities and strategies

Partial histories