Toy model piece #1: Partial preferences revisited

I am very confused by the math in this post:

Why must a preorder decompose into disjoint ordered chains? If I have a partial preference $w_{1} < w_{3}$ and another partial preference $w_{2} < w_{3}$ how do those induce disjoint ordered chains where worlds between chains are incomparable? Perhaps you are asking us to assume that the preorder decomposes into disjoint ordered chains?

How do cycles vanish in $¯ ¯¯¯¯ ¯ W$ ? Can you work through the example where the partial preference expressed by the human is $w_{1} < w_{2} < w_{3} < w_{1}$ ?

we can extend this to $¯ ¯¯¯¯ ¯ W$ by setting U(w)=U(p(w)).

I think this is extending to $W$ ?

which has $| | U (w') - U (w) | |$ for all $w \leftarrow w'$ .

Should that be $| | U (w^{'}) - U (w) | | = 2$ ?

[-]Stuart_Armstrong6y*20

Thanks, corrected a few typos.

Why must a preorder decompose into disjoint ordered chains?

They don't have to; I'm saying that sensible partial preferences (eg $P$ ) should do so. I then see how I'd deal with sensible preorders, then generalise to all preorders in the next section.

How do cycles vanish in $¯ ¯¯¯¯ ¯ W$ ? Can you work through the example where the partial preference expressed by the human is $w_{1} < w_{2} < w_{3} < w_{1}$ ?

Note that what you've written is impossible as $w < w^{'}$ means $w \leq w^{'}$ but not $w^{'} \leq w$ . A preorder is transitive, so the best you can get is $w_{1} \leq w_{2} \leq w_{3} \leq w_{1}$ .

Then projecting down (via $p$ ) to $¯ ¯¯¯¯ ¯ W$ will project all these $w_{i}$ down to the same element. That's why there are no cycles, because all cycles go to points.

Then we need to check some math. Define $\leq$ on $¯ ¯¯¯¯ ¯ W$ by $p (w) \leq p (w^{'})$ iff $w \leq w^{'}$ .

This is well defined (independently of which $w$ and $w^{'}$ we use to represent $p (w)$ and $p (w^{'})$ ), because if $p (w^{''}) = p (w)$ , then $w^{''} \leq w$ , so, by transitivity, $w^{''} \leq w^{'}$ . The same argument works for $w^{'}$ .

We now want to show the $\leq$ is a partial order on $¯ ¯¯¯¯ ¯ W$ . It's transitive, because if $p (w) \leq p (w^{'})$ and $p (w^{'}) \leq p (w^{''})$ , then $w \leq w^{'} \leq w^{''}$ , and the transitivity in $W$ implies $w \leq w^{''}$ and hence $p (w) \leq p (w^{''})$ .

That shows it's a preorder. To show partial order, we need to show there are no cycles. So, if $p (w) \leq p (w^{'})$ and $p (w^{'}) \leq p (w)$ , then $w \leq w^{'}$ and $w^{'} \leq w$ , hence, by definition of $p$ , $p (w) = p (w^{'})$ . So it's a partial order.

[-]Rohin Shah6y10

Thanks!

[-]Hazard6y10

For cycles, it looks like the projection to $¯ ¯¯¯¯ ¯ W$ is akin to taking all the worlds that form a given cycle, and compressing them into a single world.

In your example, it's true $w_{i} < w_{j}$ and $w_{j} < w_{i}$ when $i \neq j$ . That's the condition for equivalence in the project, so you have that $w_{1} = w_{2} = w_{3}$ . If you're thinking about the ordering as a directed graph, you can collapse those worlds to a single point and not mess up the ordering.

[-]Rohin Shah6y10

Ah yes, that makes sense, thanks! I didn't realize what $¯ ¯¯¯¯ ¯ W$ was the set of equivalence classes of $W$

[-]Rohin Shah6y10

Suppose I express a partial preference over "good worlds" and another one over "bad worlds", for example "when everyone's needs for food, water and shelter are met, then it is better for there to be more social connection" and "when I am living in extreme poverty, I prefer to be in a country with a good social safety net". These talk about mutually exclusive worlds, and so lead to two distinct ordered chains. Then, on average you assign the same utility to a good world and a bad world, which seems very bad. How do we avoid this issue?

[-]Stuart_Armstrong6y10

By adding in a third preference, which explicitely says that extreme poverty is worse than having all needs met.

These are just pieces of the total utility, remember. Even if they are full preferences, they are not all our preferences.

Very rough argument: choose some ordering for the worlds in $W_{i}$ , write $x_{i}$ for $U (w_{i})$ , and set $¯ ¯ ¯ x = (x_{0}, \dots x_{m})$ . Then, since $g$ is a quadratic with only quadratic terms, we can write it as its own Hessian: $g (¯ ¯ ¯ x) = {¯ ¯ ¯ x}^{T} \frac{H (g)}{2} ¯ ¯ ¯ x$ .

Now assume that ${¯ ¯ ¯ x}^{T} \frac{H (g)}{2} ¯ ¯ ¯ x = 0$ , for $¯ ¯ ¯ x = (r_{0}, r_{1}, \dots r_{m})$ . However, $g$ is the sum of non-negative terms, so this is only possible if all of the $\sum_{w \leftarrow w^{'}} (U (w^{'}) - U (w))^{2}$ are zero. This is only possible if $U (w^{'}) = U (w)$ whenever $w \leftarrow w^{'}$ ; thus, since $W_{i}$ is connected by links, $U$ must be constant on $W_{i}$ . In other words, only $¯ ¯ ¯ x = (r, r, \dots r)$ allows ${¯ ¯ ¯ x}^{T} \frac{H (g)}{2} ¯ ¯ ¯ x = 0$ .

Thus $H (g)$ has only one zero eigenvector, corresponding to translations. Since condition 3. precludes additional translations, $H (g)$ is strictly positive definite on the subspace it defines. Hence $g$ is strictly convex on this space. ↩︎

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

5

Toy model piece #1: Partial preferences revisited

5

The problem with the old definition

New definition: preorder

Circular preferences and utility functions

The sensible case

The general case

Extending the sensible case

The final version of partial preferences