Intertheoretic utility comparison: simple theory

[-]AlexMennen9y00

You talk like $p$ is countably supported, but everything you've said generalizes to arbitrary probability measures $p$ over $S$ , if you replace "for all $u$ assigned nonzero probability by $p$ " with "for all $u$ in some set assigned probability $1$ by $p$ ".

If you endow $U$ with the quotient topology from $R^{S} / \sim$ , then the only open set containing $0$ is all of $U$ . This is a funny-looking topology, but I think it is ultimately the best one to use. With this topology, every function to $U$ is continuous at any point that maps to $0$ . As a consequence, the assumption "if $f (S, p) \in S$ " in the continuity axiom is unnecessary. More importantly, what topology on the space of probability distributions did you have in mind? Probably the weak topology?

I find independence of irrelevant alternatives more compelling than symmetry, but as long as we're accepting symmetry instead, it probably makes sense to strengthen the assumption to isomorphism-invariance: If $ρ : S_{1} \to S_{2}$ is a bijection, then $f (S_{2}, p) \circ ρ = f (S_{1}, p \circ ρ)$ .

The relevance axioms section is riddled with type errors. $u (s) = u (σ (s))$ only makes sense if $S = S_{\neg X} ⊔ S_{X}$ , which would make sense if $S$ represented a space of outcomes rather than a space of strategies (which seems to me to be a more natural space to pay attention to anyway), or if $X$ is fully under the agent's control, whereas $S = S_{\neg X} \times S_{X}$ makes sense if $X$ is fully observable to the agent. If $X$ is neither fully under the agent's control nor fully observable to the agent, then I don't think either of these make sense. If we're using $\times$ instead of $⊔$ , then formalizing irrelevance seems trickier. The best I can come up with is that $p$ is supported on $u$ of the form $u (s, t) = (1 - q) ~ u (σ (s)) + q ~ u (t)$ , where $q$ is the probability of $X$ . The weak and strong irrelevance axioms also contain type errors, since the types of the output and second input of $f$ depend on its first input, though this can probably be fixed.

I didn't understand any of the full theory section, so if any of that was important, it was too brief.

[-]Stuart_Armstrong9y00

Yes to your two initial points; I wanted to keep the exposition relatively simple.

Do you disagree with the reasoning presented in the picture-proof? That seems a simple argument against IIA. Isomorphism invariance makes sense, but I wanted to emphasise the inner structure of $S$ .

Updated the irrelevance section to clarify that $X$ is fully observed and happens before the agent takes any actions, and that $u (s)$ should be read as $u (s | \neg X)$ .

The full theory section is to write up some old ideas, to show that the previous axioms are not set in stone but that other approaches are possible and were considered.

[-]AlexMennen9y00

Your picture proof looks correct, but it relies on symmetry, and I was saying that I prefer IIA instead of symmetry. I'm not particularly confident in my endorsement of IIA, but I am fairly confident in my non-endorsement of symmetry. In real situations, strategies/outcomes have a significant amount of internal structure which seems relevant and is not preserved by arbitrary permutations.

You've just replaced a type error with another type error. Elements of $U$ are just (equivalence classes of) functions $S \to R$ . Conditioning like that isn't a supported operation.

[-]Stuart_Armstrong9y00

You're right. I've drawn the set of utility functions too broadly. I'll attempt to fix this in the post.

[-]Stuart_Armstrong9y00

Ok, I chose the picture proof because it was a particularly simple example of symmetry. What kind of internal structure are you thinking of?

[-]AlexMennen9y00

For strategies: This ties back in to the situation where there's an observable event $X$ that you can condition your strategy on, and the strategy space has a product structure $S = S_{X} \times S_{\neg X}$ . This product structure seems important, since you should generally expect utility functions $u$ to factor in the sense that $u (s, t) = q u_{X} (s) + (1 - q) u_{\neg X} (t)$ for some functions $u_{X}$ and $u_{\neg X}$ , where $q$ is the probability of $X$ (I think for the relevance section, you want to assume that whenever there is such a product structure, $p$ is supported on utility functions that factor, and you can define conditional utility for such functions). Arbitrary permutations of $S$ that do not preserve the product structure don't seem like true symmetries, and I don't think it should be expected that an aggregation rule should be invariant under them. In the real world, there are many observations that people can and do take into account when deciding what to do, so a good model of strategy-space should have a very rich structure.

For outcomes, which is what utility functions should be defined on anyway: Outcomes differ in terms of how achievable they are. I have an intuition that if an outcome is impossible, then removing it from the model shouldn't have much effect. Like, you shouldn't be able to rig the aggregator function in favor of moral theory 1 as opposed to moral theory 2 by having the model take into account all the possible outcomes that could realistically be achieved, and also a bunch of impossible outcomes that theory 2 thinks are either really good or really bad, and theory 1 thinks are close to neutral. A natural counter-argument is that before you know which outcomes are impossible, any Pareto-optimal way of aggregating your possible preference functions must not change based on what turns out to be achievable; I'll have to think about that more. Also, approximate symmetries between peoples' preferences seem relevant to interpersonal utility comparison in practice, in the sense that two peoples' preferences tend to look fairly similar to each other in structure, but with each person's utility function centered largely around what happens to themselves instead of the other person, and this seems to help us make comparisons of the form "the difference between outcomes 1 and 2 is more important for person A than for person B"; I'm not sure if this way of describing it is making sense.

[-]Stuart_Armstrong9y00

OK, got a better formalism: https://agentfoundations.org/item?id=1449

[-]Stuart_Armstrong9y00

I think I've got something that works; I'll post it tomorrow.

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

1

Intertheoretic utility comparison: simple theory

1

A question of scale

The setup

Basic axioms

Relevance axioms

Full theory