*(This is a basic point about utility theory which many will already be familiar with. I draw some non-obvious conclusions which may be of interest to you even if you think you know this from the title -- but the main point is to communicate the basics. I'm posting it to the alignment forum because I've heard misunderstandings of this from some in the AI alignment research community.)*

I will first give the basic argument that the utility quantities of different agents aren't directly comparable, and a few important consequences of this. I'll then spend the rest of the post discussing what to do when you need to compare utility functions.

# Utilities aren't comparable.

Utility isn't an ordinary quantity. A utility function is a device for expressing the preferences of an agent.

Suppose we have a notion of *outcome.* *We could try to represent the agent's preferences between outcomes as an ordering relation: if we have outcomes A, B, and C, then one possible preference would be A<B<C.

However, a mere ordering does not tell us how the agent would decide between *gambles,* ie, situations giving A, B, and C with some probability.

With just three outcomes, there is only one thing we need to know: is B closer to A or C, and by how much?

We want to construct a utility function U() which represents the preferences. Let's say we set U(A)=0 and U(C)=1. Then we can represent B=G as U(B)=1/2. If not, we would look for a different gamble which *does* equal B, and then set B's utility to the expected value of that gamble. By assigning real-numbered values to each outcome, we can fully represent an agent's preferences over gambles. (Assuming the VNM axioms hold, that is.)

But the initial choices U(A)=0 and U(C)=1 were arbitrary! We could have chosen any numbers so long as U(A)<U(C), reflecting the preference A<C. In general, a valid representation of our preferences U() can be modified into an equally valid U'() by adding/subtracting arbitrary numbers, or multiplying/dividing by positive numbers.

So it's just as valid to say someone's expected utility in a given situation is 5 or -40, provided you shift everything *else* around appropriately.

Writing to mean that two utility functions represent the same preferences, what we have in general is: if and only if . (I'll call the * multiplicative constant *and

*the*

*.)*

**additive constant**This means that we can't directly compare the utility of two different agents. Notions of fairness should not directly say "everyone should have the same expected utility". Utilitarian ethics cannot directly maximize the sum of everyone's utility. Both of these operations should be thought of as a type error.

# Some non-obvious consequences.

The game-theory term "zero sum" is a misnomer. You shouldn't directly think about the sum of the utilities.

In mechanism design, *exchangeable utility* is a useful assumption which is often needed in order to get nice results. The idea is that agents can give utils to each other, perhaps to compensate for unfair outcomes. This is *kind of* like assuming there's money which can be exchanged between agents. However, the non-comparability of utility should make this seem *really weird*. (There are also other disanalogies with money; for example, utility is closer to logarithmic in money, not linear.)

This could (should?) also make you suspicious of talk of "average utilitarianism" and "total utilitarianism". However, beware: only one kind of "utilitarianism" holds that the term "utility" in decision theory means the same thing as "utility" in ethics: namely, preference utilitarianism. Other kinds of utilitarianism can distinguish between these two types of utility. (For example, one can be a hedonic utilitarian without thinking that what everyone wants is happiness, if one isn't a preference utilitarian.)

Similarly, for preference utilitarians, talk of *utility monsters* becomes questionable. A utility monster is, supposedly, someone who gets much more utility out of resources than everyone else. For a hedonic utilitarian, it would be someone who experiences much deeper sadness and much higher heights of happiness. This person supposedly merits more resources than other people.

For a preference utilitarian, incomparability of utility means we can't simply posit such a utility monster. It's meaningless *a priori* to say that one person simply has much stronger preferences than another (in the utility function sense).

All that being said, we *can* actually compare utilities, sum them, exchange utility between agents, define utility monsters, and so on. We just need *more information.*

# Comparing utilities.

The incomparability of utility functions * doesn't mean* we can't trade off between the utilities of different people.

I've heard the non-comparability of utility functions summarized as the thesis that we can't say anything meaningful about the relative value of one person's suffering vs another person's convenience. Not so! Rather, the point is just that *we need more assumptions in order to say anything. *The utility functions alone aren't enough.

## Pareto-Optimality: The Minimal Standard

Comparing utility functions suggests putting them all onto one scale, such that we can trade off between them -- "this dollar does more good for Alice than it does for Bob". We formalize this by imagining that we have to decide policy for the whole group of people we're considering (e.g., the whole world). We consider a *social choice function* which would make those decisions on behalf of everyone. Supposing it is VNM rational, its decisions must be comprehensible in terms of a utility function, too. So the problem reduces to combining a bunch of individual utility functions, to get one big one.

So, how do we go about combining the preferences of many agents into one?

The first and most important concept is the * pareto improvement: our social choice function should endorse changes which benefit someone and harm no one. *An option which allows no such improvements is said to be

**Pareto-optimal.**We might also want to consider * strict Pareto improvements: a change which benefits everyone. *(An option which allows no strict Pareto improvements is

*) Strict Pareto improvements can be more relevant in a bargaining context, where you need to give everyone something in order to get them on board with a proposal -- otherwise they may judge the improvement as unfairly favoring others. However, in a bargaining context, individuals may refuse even a strict Pareto improvement due to fairness considerations.*

**weakly Pareto-optimal.**In either case, a version of Harsanyi's utilitarianism Theorem implies that the utility of our social choice function *can be understood as some linear combination of the individual utility functions.*

So, pareto-optimal social choice functions can always be understood by:

- Choosing a scale for everyone's utility function -- IE, set the multiplicative constant. (If the social choice function is only weakly Pareto optimal, some of the multiplicative constants might turn out to be zero, totally cancelling out someone's involvement. Otherwise, they can all be positive.)
- Adding all of them together.

(Note that the *additive constant* doesn't matter -- shifting a person's utility function up or down doesn't change what decisions will be endorsed by the sum. However, it * will* matter for some other ways to combine utility functions.)

This is nice, because we can always combine everything linearly! We just have to set things to the right scale and then sum everything up.

However, it's far from the end of the story. How do we choose multiplicative constants for everybody?

## Variance Normalization: Not Too Exploitable?

We could set the constants any way we want... totally subjective estimates of the worth of a person, draw random lots, etc. But we do typically want to represent some notion of fairness. We said in the beginning that the problem was, a utility function has many equivalent representations . We can address this as a problem of * normalization:* we want to take a and put it into a canonical form, getting rid of the choice between equivalent representations.

One way of thinking about this is * strategy-proofness*. A utilitarian collective should not be vulnerable to members strategically claiming that their preferences are stronger (larger ), or that they should get more because they're worse off than everyone (smaller -- although, remember that we haven't talked about any setup which actually cares about that, yet).

**Warm-Up: Range Normalization**

Unfortunately, some obvious ways to normalize utility functions are not going to be strategy-proof.

One of the simplest normalization techniques is to squish everything into a specified range, such as [0,1]:

This is analogous to range voting: everyone reports their preferences for different outcomes on a fixed scale, and these all get summed together in order to make decisions.

If you're an agent in a collective which uses range normalization, then you may want to strategically mis-report your preferences. In the example shown, the agent has a big hump around outcomes they like, and a small hump on a secondary "just OK" outcome. The agent might want to get rid of the second hump, forcing the group outcome into the more favored region.

I believe that in the extreme, the optimal strategy for range voting is to choose some utility threshold. Anything below that threshold goes to zero, feigning maximal disapproval of the outcome. Anything above the threshold goes to one, feigning maximal approval. In other words, under strategic voting, range voting becomes approval voting (range voting where the only options are zero and one).

If it's not possible to mis-report your preferences, then the incentive becomes to *self-modify to literally have these extreme preferences. *This could perhaps have a real-life analogue in political outrage and black-and-white thinking. If we use this normalization scheme, that's the closest you can get to being a utility monster.

**Variance Normalization**

We'd *like* to avoid *any* incentive to misrepresent/modify your utility function. Is there a way to achieve that?

Owen Cotton-Barratt discusses different normalization techniques in illuminating detail, and argues for *variance normalization:* divide utility functions by their variance, making the variance one. (*Geometric reasons for normalizing variance to aggregate preferences,* O Cotton-Barratt, 2013.) Variance normalization is strategy-proof under the assumption that everyone participating in an election shares beliefs about how probable the different outcomes are! (Note that *variance* *of utility* is only well-defined under some assumption about *probability of outcome.*) That's pretty good. It's probably the best we can get, in terms of strategy-proofness of voting. Will MacAskill also argues for variance normalization in the context of normative uncertainty (*Normative Uncertainty, *Will MacAskill, 2014).

Intuitively, variance normalization directly addresses the issue we encountered with range normalization: an individual attempts to make their preferences "loud" by extremizing everything to 0 or 1. This increases variance, so, is directly punished by variance normalization.

However, Jameson Quinn, LessWrong's resident voting theory expert, has warned me rather strongly about variance normalization.

- The assumption of shared beliefs about election outcomes is far from true in practice. Jameson Quinn tells me that, in fact, the strategic voting incentivized by quadratic voting is
*particularly bad*amongst normalization techniques. - Strategy-proofness isn't, after all, the final arbiter of the quality of a voting method. The final arbiter should be something like the utilitarian quality of an election's outcome. This question gets a bit weird and recursive in the current context, where I'm using elections as an analogy to ask how we should define utilitarian outcomes. But the point still, to some extent, stands.

I didn't understand the full justification behind his point, but I came away thinking that range normalization was probably better in practice. After all, it reduces to approval voting, which is actually a pretty good form of voting. But if you want to do the best we can with the state of voting theory, Jameson Quinn suggested 3-2-1 voting. (I don't think 3-2-1 voting gives us any nice theory about how to combine utility functions, though, so it isn't so useful for our purposes.)

**Open Question: ***Is there a variant of variance normalization which takes differing beliefs into account, to achieve strategy-proofness (IE honest reporting of utility)?*

Anyway, so much for normalization techniques. These techniques ignore the broader context. They attempt to be fair and even-handed *in the way we choose the multiplicative and additive constants.* But we could also explicitly try to be fair and even-handed *in the way we choose between Pareto-optimal outcomes*, as with this next technique.

## Nash Bargaining Solution

It's important to remember that the Nash bargaining solution is a solution *to the Nash bargaining problem*, which isn't quite our problem here. But I'm going to gloss over that. Just imagine that we're setting the social choice function through a massive negotiation, so that we can apply bargaining theory.

Nash offers a very simple solution, which I'll get to in a minute. But first, a few words on how this solution is derived. Nash provides two seperate justifications for his solution. The first is a game-theoretic derivation of the solution as an especially robust Nash equilibrium. I won't detail that here; I quite recommend his original paper (*The Bargaining Problem, *1950); but, just keep in mind that there is at least some reason to expect selfishly rational agents to hit upon this particular solution. The second, unrelated justification is an axiomatic one:

*Invariance to equivalent utility functions.*This is the same motivation I gave when discussing normalization.*Pareto optimality.*We've already discussed this as well.*Independence of Irrelevant Alternatives (IIA).*This says that we shouldn't change the outcome of bargaining by removing options which won't ultimately get chosen anyway. This isn't even technically one of the VNM axioms, but it*essentially*is -- the VNM axioms are posed for binary preferences (a > b). IIA is the assumption we need to break down multi-choice preferences to binary choices. We can justify IIA with a kind of money pump.*Symmetry.*This says that the outcome doesn't depend on the order of the bargainers; we don't prefer Player 1 in case of a tie, or anything like that.

Nash proved that *the only way to meet these four criteria* is to maximize the **product** of gains from cooperation. More formally, choose the outcome which maximizes:

The here is a "status quo" outcome. You can think of this as what happens if the bargaining fails. This is sometimes called a "threat point", since strategic players should carefully set what they do *if negotiation fails* so as to maximize their bargaining position. However, you might also want to rule that out, forcing to be a Nash equilibrium in the hypothetical game where there is no bargaining opportunity. As such, is also known as the *best alternative to negotiated agreement (BATNA)*, or sometimes the "disagreement point" (since it's what players get if they can't agree). We can think of subtracting out as just a way of adjusting the additive constant, in which case we really are just maximizing the product of utilities. (The BATNA point is always (0,0) after we subtract out things that way.)

The Nash solution differs significantly from the other solutions considered so far.

- Maximize the
*product??*Didn't Harsanyi's theorem guarantee we only need to worry about sums? - This is the first proposal where the additive constants matter. Indeed, now the
*multiplicative*constants are the ones that don't matter! - Why wouldn't
*any*utility-normalization approach satisfy those four axioms?

Last question first: how do normalization approaches violate the Nash axioms?

Well, both range normalization and variance normalization violate IIA! If you remove one of the possible outcomes, the normalization may change. This makes the social choice function display inconsistent preferences across different scenarios. (But how bad is that, really?)

As for why we can get away with maximizing the product, rather than the sum:

The Pareto-optimality of Nash's approach guarantees that it *can be seen* as maximizing a linear function of the individual utilities. So Harsanyi's theorem is still satisfied. However, Nash's solution points to a very *specific* outcome, which Harsanyi doesn't do for us.

Imagine you and me are trying to split a dollar. If we can't agree on how to split it, then we'll end up destroying it (ripping it during a desperate attempt to wrestle it from each other's hands, obviously). Thankfully, John Nash is standing by, and we each agree to respect his judgement. No matter which of us claims to value the dollar more, Nash will allocate 50 cents to each of us.

Harsanyi happens to see this exchange, and explains that Nash has chosen a social choice function which normalized our utility functions to be equal to each other. That's the only way Harsanyi can explain the choice made by Nash -- the value of the dollar was precisely tied between you and me, so a 50-50 split was as good as any other outcome. Harsanyi's justification is indeed *consistent* with the observation. But why, then, did Nash choose 50-50 *precisely?* 49-51 would have had exactly the same collective utility, as would 40-60, or any other split!

Hence, Nash's principle is far more useful than Harsanyi's, even though Harsanyi can justify any rational outcome retrospectively.

However, Nash does rely somewhat on that pesky IIA assumption, whose importance is perhaps not so clear. Let's try getting rid of that.

## Kalai–Smorodinsky

Although the Nash bargaining solution is the most famous, there are other proposed solutions to Nash's bargaining problem. I want to mention just one more, Kalai-Smorodinsky (I'll call it KS).

KS throws out IIA as irrelevant. After all, the set of alternatives *will* affect bargaining. Even in the Nash solution, the set of alternatives may have an influence by changing the BATNA! So perhaps this assumption isn't so important.

KS instead adds a *monotonicity* assumption: being in a better position should never make me worse off after bargaining.

Here's an illustration, due to Daniel Demski, of a case where Nash bargaining fails monotonicity:

I'm not that sure monotonicity really should be an axiom, but it does kind of suck to be in an apparently better position and end up worse off for it. Maybe we could relate this to strategy-proofness? A little? Not sure about that.

Let's look at the formula for KS bargaining.

Suppose there are a couple of dollars on the ground: one which you'll walk by first, and one which I'll walk by. If you pick up your dollar, you can keep it. If I pick up my dollar, I can keep mine. But also, if you *don't* pick up yours, then I'll eventually walk by it and can pick it up. So we get the following:

(The box is filled in because we can also use mixed strategies to get values intermediate between any pure strategies.)

Obviously in the real world we just both pick up our dollars. But, let's suppose we bargain about it, just for fun.

The way KS works is, you look at the maximum *one* player can get (you can get $1), and the maximum the *other* player could get (I can get $2). Then, although we can't usually jointly achieve those payoffs (I can't get $2 at the same time as you get $1), KS bargaining insists we achieve the same *ratio* (I should get twice as much as you). In this case, that means I get $1.33, while you get $0.66. We can visualize this as drawing a bounding box around the feasible solutions, and drawing a diagonal line. Here's the Nash and KS solutions side by side:

As in Daniel's illustrations, we can visualize maximizing the product as drawing the largest hyperbola we can that still touches the orange shape. (Orange dotted line.) This suggests that we each get $1; exactly the same solution as Nash would give for splitting $2. (The black dotted line illustrates how we'd continue the feasible region to represent a dollar-splitting game, getting the full triangle rather than a chopped off portion.) Nash doesn't care that one of us can do better than the other; it just looks for the most equal division of funds possible, since that's how we maximize the product.

KS, on the other hand, cares what the max possible is for both of us. It therefore suggests that you give up some of your dollar to me.

I suspect most readers will * not* find the KS solution to be more intuitively appealing?

Note that the KS monotonicity property does NOT imply the desirable-sounding property "if there are more opportunities for good outcomes, everyone gets more or is at least not worse off." (I mention this mainly because I initially misinterpreted KS's monotonicity property this way.) In my dollar-collecting example, KS bargaining makes you worse off simply because there's an opportunity for me to take your dollar if you don't.

Like Nash bargaining, KS bargaining ignores multiplicative constants on utility functions, and can be seen as normalizing additive constants by treating as (0,0). (Note that, in the illustration, I assumed is chosen as (minimal achievable for one player, minimal achievable for the other). this need not be the case in general.)

A peculiar aspect of KS bargaining is that it doesn't really give us an obvious quantity to maximize, unlike Nash or Harsanyi. It only describes the optimal point. This seems far less practical, for realistic decision-making.

OK, so, should we use bargaining solutions to compare utilities?

My intuition is that, because of the need to choose the BATNA point , bargaining solutions end up rewarding destructive threats in a disturbing way. For example, suppose that we are playing the dollar-splitting game again, except that I can costlessly destroy $20 of your money, so now involves both the destruction of the $1, and the destruction of $20. Nash bargaining now hands the entire dollar to me, because you are "up $20" in that deal, so the fairest possible outcome is to give me the $1. KS bargaining splits things up a little, but I still get most of the dollar.

If utilitarians were to trade off utilities that way in the real world, it would benefit powerful people, especially those willing to exploit their power to make credible threats. If X can take everything away from Y, then Nash bargaining sees everything Y has as already counting toward "gains from trade".

As I mentioned before, sometimes people try to define BATNAs in a way which excludes these kinds of threats. However, I see this as ripe for strategic utility-spoofing (IE, lying about your preferences, or self-modifying to have more advantageous preferences).

So, this might favor normalization approaches.

On the other hand, Nash and KS both do way better in the split-the-dollar game than any normalization technique, because they can optimize for fairness of outcome, rather than just fairness of multiplicative constants chosen to compare utility functions with.

Is there any approach which combines the advantages of bargaining and normalization??

# Animals, etc.

An essay on utility comparison would be incomplete without at least mentioning the problem of animals, plants, and so on.

- Option one: some cutoff for "moral patients" is defined, such that a utilitarian only considers preferences of agents who exceed the cutoff.
- Option two: some more continuous notion is selected, such that we care more about some organisms than others.

Option two tends to be more appealing to me, despite the non-egalitarian implications (e.g., if animals differ on this spectrum, than humans could have some variation as well).

As already discussed, bargaining approaches do seem to have this feature: animals would tend to get less consideration, because they've got less "bargaining power" (they can do less harm to humans than humans can do to them). However, this has a distasteful might-makes-right flavor to it.

This also brings to the forefront the question of how we view something as an agent. Something like a plant might have quite deterministic ways of reacting to environmental stimulus. Can we view it as making choices, and thus, as having preferences? Perhaps "to some degree" -- if such a degree could be defined, numerically, it could factor into utility comparisons, giving a formal way of valuing plants and animals *somewhat, *but "not too much".

# Altruistic agents.

Another puzzling case, which I think needs to be handled carefully, is accounting for the preferences of altruistic agents.

Let's proceed with a simplistic model where agents have "personal preferences" (preferences which just have to do with themselves, in some sense) and "* cofrences*" (co-preferences; preferences having to do with other agents).

Here's an agent named Sandy:

Sandy | ||||

Personal Preferences | Cofrences | |||

Candy | +.1 | Alice | +.1 | |

Pizza | +.2 | Bob | -.2 | |

Rainbows | +10 | Cathy | +.3 | |

Kittens | -20 | Dennis | +.4 |

The cofrences represent coefficients on other agent's utility functions. Sandy's preferences are supposed to be understood as a utility function representing Sandy's *personal* preferences, plus a weighted sum of the utility functions of Alice, Bob, Cathy, and Dennis. (Note that the weights can, hypothetically, be negative -- for example, screw Bob.)

The first problem is that utility functions are not comparable, so we have to say more before we can understand what "weighted sum" is supposed to mean. But suppose we've chosen some utility normalization technique. There are still other problems.

Notice that we can't totally define Sandy's utility function until we've defined Alice's, Bob's, Cathy's, and Dennis'. But any of those four might have cofrences which involve Sandy, as well!

Suppose we have Avery and Briar, two lovers who "only care about each other" -- their only preference is a cofrence, which places 1.0 value on the other's utility function. We could ascribe *any values at all* to them, so long as they're both the same!

With some technical assumptions (something along the lines of: your cofrences always sum to less than 1), we can ensure a unique fixed point, eliminating any ambiguity from the interpretation of cofrences. However, I'm skeptical of just taking the fixed point here.

Suppose we have five siblings: Primus, Secundus, Tertius, Quartus, et Quintus. All of them value each other at .1, except Primus, who values all siblings at .2.

If we simply take the fixed point, Primus is going to get the short end of the stick all the time: because Primus cares about everyone else more, everyone else cares about Primus' personal preferences *less* than anyone else's.

Simply put, I don't think more altruistic individuals should be punished! In this setup, the "utility monster" is the perfectly selfish individual. Altruists will be scrambling to help this person while the selfish person does nothing in return.

A different way to do things is to interpret cofrences as *integrating only the personal preferences of the other person.* So Sandy wants to help Alice, Cathy, and Dennis (and harm Bob), but does *not* automatically extend that to wanting to help any of their friends (or harm Bob's friends).

This is a little weird, but gives us a more intuitive outcome in the case of the five siblings: Primus will more often be voluntarily helpful to the other siblings, but the other siblings won't be prejudice *against* the personal preferences of Primus when weighing between their various siblings.

I realize altruism isn't *exactly* supposed to be like a bargain struck between selfish agents. But if I think of utilitarianism like a coalition of all agents, then I don't want it to punish the (selfish component of) the most altruistic members. It seems like utilitarianism should have better incentives than that?

(Try to take this section as more of a problem statement and less of a solution. Note that the concept of *cofrence* can include, more generally, preferences such as "I want to be better off than other people" or "I don't want my utility to be too different from other people's in either direction".)

# Utility monsters.

Returning to some of the points I raised in the "non-obvious consequences" section -- now we can see how "utility monsters" are/aren't a concern.

On my analysis, a utility monster is just an agent who, according to your metric for comparing utility functions, has a very large influence on the social choice function.

This might be a bug, in which case you should reconsider how you are comparing utilities. But, since you've hopefully chosen your approach carefully, it could also not be a bug. In that case, you'd want to bite the bullet fully, defending the claim that such an agent should receive "disproportionate" consideration. Presumably this claim could be backed up, on the strength of your argument for the utility-comparison approach.

# Average utilitarianism vs total utilitarianism.

Now that we have given some options for utility comparison, can we use them to make sense of the distinction between average utilitarianism and total utilitarianism?

No. Utility comparison doesn't really help us there.

The average vs total debate is a debate about population ethics. Harsanyi's utilitarianism theorem and related approaches let us think about altruistic policies for a fixed set of agents. They don't tell us how to think about a set which changes over time, as new agents come into existence.

Allowing the set to vary over time like this feels similar to allowing a single agent to change its utility function. There is no rule against this. An agent can prefer to have different preferences than it does. A collective of agents can prefer to extend its altruism to new agents who come into existence.

However, I see no reason why population ethics needs to be *simple*. We can have relatively complex preferences here. So, I don't find paradoxes such as the Repugnant Conclusion to be especially concerning. To me there's just this complicated question about what everyone collectively wants for the future.

One of the basic questions about utilitarianism shouldn't be "average vs total?". To me, this is a type error. It seems to me, more basic questions for a (preference) utilitarian are:

- How do you combine individual preferences into a collective utility function?
- How do you compare utilities between people (and animals, etc)?
- Do you care about an "objective" solution to this, or do you see it as a subjective aspect of altruistic preferences, which can be set in an unprincipled way?
- Do you range-normalize?
- Do you variance-normalize?
- Do you care about strategy-proofness?
- How do you evaluate the bargaining framing? Is it relevant, or irrelevant?
- Do you care about Nash's axioms?
- Do you care about monotonicity?
- What distinguishes humans from animals and plants, and how do you use it in utility comparison? Intelligence? Agenticness? Power? Bargaining position?

- How do you handle cofrences?

- How do you compare utilities between people (and animals, etc)?

*: Agents need not have a concept of outcome, in which case they don't really have a utility function (because utility functions are functions *of outcomes*). However, this does not significantly impact any of the points made in this post.