Toy model piece #4: partial preferences, re-re-visited

by Stuart Armstrong 11d12th Sep 2019No comments

4


Two failed attempts

I initially defined partial preferences in terms of foreground variables and background variables .

Then a partial preference would be defined by and in , such that, for any , the world described by would be better than the world described by . The idea being that, everything else being equal (ie the same ), a world with was better than a world with . The other assumption is that, within mental models, human preferences can be phrased as one or many binary comparisons. So if we have a partial preference like : "I prefer a chocolate ice-cream to getting kicked in the groin", then and are otherwise identical worlds with a chocolate ice-cream and a groin-kick, respectively.

Note that in this formalism, there are two subsets of the set of worlds, and , and map between them (which just sends to ).

In a later post, I realised that such a formalism can't capture seemingly simple preferences, such as : " people is better than people". The problem is that that preferences like that don't talk about just two subsets of worlds, but many more.

Thus a partial preference was defined as a preorder. Now, a preorder is certainly rich enough to include preferences like , but its allows for far too many different types of structures, needing a complicated energy-minimisation procedure to turn a preorder into a utility function.

This post presents another formalism for partial preferences, that keeps the initial intuition but can capture preferences like .

The formalism

Let be the (finite) set of all worlds, seen as universes with their whole history.

Let be a subset of , and let be an injective (one-to-one) map from to . Define , the image of , and as the inverse.

Then the preference is determined by:

  • For all , .

If and are disjoint, this just reproduces the original definition, with and .

But it also allows preferences like , defining as something like "the same world as , but with one less person". In that case, maps some parts of to itself.

Then for any element , we can construct its upwards and downwards chain:

  • .

These chains end when they cycle: so there is an and an so that (equivalently, ).

If they don't cycle, the upwards chain ends when there is an which is not an element of (hence is not defined on in), and the downward chain ends when there is an which is not in (and hence is not defined on it).

So, for example, for , all the chains contain two elements only: and . For , there are no cycles, and the lower chain ends when the population hits zero, while the upper chain ends when the population hits some maximal value.

Utilities difference between clearly comparable worlds

Since the worlds of decompose either into chains or cycles via , there is not need for the full machinery for utilities constructed in this post.

One thing we can define unambiguously, is the relative utility between two elements of the same chain/cycle:

  • If and are in the same cycle, then .
  • Otherwise, if and are in the same chain, then .

Currently, lets normalise these relative utilities to , by normalising each chain individually; note that if every world in the chain is reachable, this is the same as the mean-max normalisation on each chain:

  • If and are in the same cycle, then .
  • Otherwise, if and are in the same chain with total elements in the chain, then .

We we could try and extend to a global utility function which compares different chains and compares values in chains with values outside of . But as we shall see in the next post, this doesn't work when combining different partial preferences.

Interpretation of

The interpretation of is something like "this is the key difference in features that causes the difference in world-rankings". So, for , the switches out a chocolate ice-cream and substitutes a groin-kick. While for , the simply removes one person from the world.

This means that, locally, we can express in the same formalism as in the first post. Here the are the background variables, while is a discrete variable that operates on.

We cannot necessarily express this product globally. Consider, for , a situation where is an idyllic village, is an Earthbound human population, and a star-spanning civilization with extensive use of human uploads.

And if denotes the number of people in each world, it's clear that hits a low maximum for (thousands?), can rise much higher for (trillions?), and even higher for (need to use scientific notation). So though makes sense, is nonsense. So there is no global decomposition of these worlds as .

4