Toy model piece #1: Partial preferences revisited

1Rohin Shah

2Stuart Armstrong

1Rohin Shah

1Hazard

1Rohin Shah

1Rohin Shah

1Stuart Armstrong

New Comment

I am very confused by the math in this post:

Why must a preorder decompose into disjoint ordered chains? If I have a partial preference and another partial preference how do those induce disjoint ordered chains where worlds between chains are incomparable? Perhaps you are asking us to *assume* that the preorder decomposes into disjoint ordered chains?

How do cycles vanish in ? Can you work through the example where the partial preference expressed by the human is ?

we can extend this to by setting U(w)=U(p(w)).

I think this is extending to ?

which has for all .

Should that be ?

Thanks, corrected a few typos.

Why must a preorder decompose into disjoint ordered chains?

They don't have to; I'm saying that sensible partial preferences (eg ) should do so. I then see how I'd deal with sensible preorders, then generalise to all preorders in the next section.

How do cycles vanish in ? Can you work through the example where the partial preference expressed by the human is ?

Note that what you've written is impossible as means but not . A preorder is transitive, so the best you can get is .

Then projecting down (via ) to will project all these down to the same element. That's why there are no cycles, because all cycles go to points.

Then we need to check some math. Define on by iff .

This is well defined (independently of which and we use to represent and ), because if , then , so, by transitivity, . The same argument works for .

We now want to show the is a partial order on . It's transitive, because if and , then , and the transitivity in implies and hence .

That shows it's a preorder. To show partial order, we need to show there are no cycles. So, if and , then and , hence, by definition of , . So it's a partial order.

For cycles, it looks like the projection to is akin to taking all the worlds that form a given cycle, and compressing them into a single world.

In your example, it's true and when . That's the condition for equivalence in the project, so you have that . If you're thinking about the ordering as a directed graph, you can collapse those worlds to a single point and not mess up the ordering.

Suppose I express a partial preference over "good worlds" and another one over "bad worlds", for example "when everyone's needs for food, water and shelter are met, then it is better for there to be more social connection" and "when I am living in extreme poverty, I prefer to be in a country with a good social safety net". These talk about mutually exclusive worlds, and so lead to two distinct ordered chains. Then, on average you assign the *same* utility to a good world and a bad world, which seems very bad. How do we avoid this issue?

By adding in a third preference, which explicitely says that extreme poverty is worse than having all needs met.

These are just pieces of the total utility, remember. Even if they are full preferences, they are not *all* our preferences.

EDIT: This model is currently obsolete, see here for the most current version.I'm working towards a toy model that will illustrate all the steps in the research agenda. It will start with some algorithmic stand-in for the "human", and proceed to create the UH, following all the steps in that research agenda. So I'll be posting a series of "toy model pieces", that will then be ultimately combined in a full toy model. Along the way, I hope to get a better understanding of how to do the research agenda in practice, and maybe even modify that agenda based on insights making the toy model.

In this post, I'll revisit and re-formalise partial preferences, and then transform them into utility functions.

## The problem with the old definition

My previous model of partial preferences can't capture some very simple mental models, such as P="the more people smile, the better the world is".

This is because the partial preference decomposes the space of worlds locally as Y×Z, fixes two values y− and y+ in Y, and only compares worlds of type (y−,z) and (y+,z) for fixed z. This means that we can only compare worlds with the same z value, and only two of these worlds can be compared: so we can't say w1<w2<w3 for three distinct worlds. Thus we can't say that three people smiling is better than two, which is better than one. Not being able to capture preferences like P is a major flaw.

## New definition: preorder

So now model a partial preference as preorder. A preorder ≤ is a type of ordering that is transitive (if w1≤w2 and w2≤w3, then w1≤w3) and reflexive (w≤w for all worlds w).

The previous type of partial preference can be made into a preorder quite easily: w1<w2 implies w1≤w2, and add w≤w for all worlds w.

Now we can easily express P="the more people smile, the better the world is". Let w(n,v) be a world with n smiling people in it, with v representing all the other relevant variables describing the world. The P is described by the preorder:

For a general preorder ≤, define w<w′ to mean w≤w′ but it not being the case that w′≤w.

## Circular preferences and utility functions

Unfortunately, unlike the previous partial preferences, preorders can allow for circular preferences w1≤w2≤w3≤w1. In practice, most sensible partial preferences will not have circular preferences, and will instead resemble P: just a collection of orderings among separate sets of worlds.

But, it will might be possible to have circular partial preferences, maybe of the type "in Australia, the cities gets nicer as you go clockwise along the coast".

So you need a way of dealing with circular preferences, and with complicated sets of partial preferences that might include some circular preferences.

We also want a way to transform all of these preorders into a full preference, given as a utility function over all worlds. The research agenda calls for aggregating similar preferences before making them into full preferences, but we still need some way of doing this in the cases where we have a single partial preference. The rest of this section will show one way of doing this.

## The sensible case

In the simplest case, we'd have a partial preference such as "these ten worlds are good, in this order", and we'd map them to a utility function with equal spaces between each world. And we wouldn't say these ten worlds were all better or all worse than the other worlds not mentioned.

And we can do that, in the most likely and sensible case. Take P and its preorder ≤ (and <): under this partial preference, the worlds decompose into simple ordered chains. That means that if W is the set of worlds, then it decomposes as a disjoint union W=⋃i∈IWi (for some indexing set I). These sets are incomparable: if wi∈Wi and wj∈Wj, then neither wi≤wj nor wj≤wi.

Moreover, each of these Wi is totally ordered by <: so we can index any wi∈Wi by some natural number k, as wik, and say that wik<wil if and only if k<l. Let ni=||Wi||−1 be the size of Wi minus one, and order the elements of it from 0 upwards: so the set is ordered as wi0<wi1<…<wini.

Then here is how we construct a utility function U that extends the partial order:

This means that if w is incomparable with all other worlds (ie, is not relevant to the partial preference), the U(w)=0, and that all chains Wi have utilities U(wi0)=−ni, U(wi1)=−ni+2, …, U(wini)=ni. So they are symmetric around 0.

## The general case

Here I'll give a way of generalising the above procedure to the case of any preorder. Note that this situation should only come up very rarely, since most preorders derived from mental models will be more sensible. Still, for completeness, here is a process that extends the simple model:

First, to get rid of circular preferences, project the set of worlds W to ¯¯¯¯¯¯W, by using the equivalence relation w≅w′ means w≤w′ and w′≤w. Call this projection p. The preorder on W descends via p to a partial order. So we now work in ¯¯¯¯¯¯W, which has no circular preferences. Then if we assign a utility U to ¯¯¯¯¯¯W, we can extend this to W by setting U(w)=U(p(w)).

Now working in ¯¯¯¯¯¯W, define a link between two (equivalence class of) worlds w and w′. Write w←w to say that w<w′, and that there does not exist any world v∈¯¯¯¯¯¯W with w<v<w′.

Now, decompose ¯¯¯¯¯¯W as collection of disjoint sets ⋃i∈IWi (for some indexing set I). Two worlds w and w′ are in the set Wi if you can get from one to another following links; ie if there exists worlds wik with w=wi0 and w′=wil and for all k, either wik←wi(k+1) or wi(k+1)←wik.

Let ni be the number of links in Wi. We'll now assign utility to elements of Wi, as a constrained optimisation process; in the following, all worlds are assumed to be in Wi:

minimiseg(U)=∑w←w′(U(w′)−U(w))2, subject to the constraints that:As long as U is not constant on Wi, then conditions 2. and 3. just fix a scaling and a translation of U. Condition 1. means that condition 2. is equivalent to ∑w←w′|U(w′)−U(w)|=2ni, since U(w)≤U(w′) for all w<w′, which includes all w←w′.

This is a convex optimisation problem in the values of the U(w), since g(U) is convex in these values. Note that it is not strictly convex, since translations leave it unchanged: g(U)=g(U+c). Nevertheless, it can be seen

^{[1]}that this equation is strictly convex on the subspace defined by condition 3. Therefore there is a single solution to the minimisation problem above.## Extending the sensible case

It's not hard to see that this extends the sensible model above, which has U(w′)−U(w)=2 for all w←w′ (a little more work is required to show that having all those values equal minimises the sum ∑w←w′(U(w′)−U(w))2).

## The final version of partial preferences

Is this the final version of partial preferences? No, certainly not. But to get a better generalisation, we're going to have to have a look at how people actually model things inside their brains and thought processes. Hence the question of how best to model partial models will be an empirical one. But this very general definition will suffice for the moment.

Very rough argument: choose some ordering for the worlds in Wi, write xi for U(wi), and set ¯¯¯x=(x0,…xm). Then, since g is a quadratic with only quadratic terms, we can write it as its own Hessian: g(¯¯¯x)=¯¯¯xTH(g)2¯¯¯x.

Now assume that ¯¯¯xTH(g)2¯¯¯x=0, for ¯¯¯x=(r0,r1,…rm). However, g is the sum of non-negative terms, so this is only possible if all of the ∑w←w′(U(w′)−U(w))2 are zero. This is only possible if U(w′)=U(w) whenever w←w′; thus, since Wi is connected by links, U must be constant on Wi. In other words, only ¯¯¯x=(r,r,…r) allows ¯¯¯xTH(g)2¯¯¯x=0.

Thus H(g) has only one zero eigenvector, corresponding to translations. Since condition 3. precludes additional translations, H(g) is strictly positive definite on the subspace it defines. Hence g is strictly convex on this space. ↩︎