# 12

Frontpage

A dozen years ago, Eliezer Yudkowsky asked us which was less (wrong?) bad:

• 3^^^3 people each getting a dust speck in their eyes,

or

• 1 person getting horribly tortured continually for 50 years.

He cheekily ended the post with "I think the answer is obvious.  How about you?"

To this day, I do not have a clue which answer he thinks is obviously true, but I strongly believe that the dust specks are preferable. Namely, I don't think there's any number we can increase 3^^^3 to that would change the answer, because I think the disutility of dust specks is fundamentally incomparable with that of torture. If you disagree with me, I don't care--that's not actually the point of this post. Below, I'll introduce a generalized type of utility function which allows for making this kind of statement with mathematical rigor.

## Linearity and Archimedes

A utility function is called linear if it respects scaling--5 people getting dust specked is 5 times as bad as a single dust specker. This is a natural and common assumption for what utility functions should look like.

An ordered algebraic structure (number system) is called Archimedean if for any two positive values , , there exists natural numbers , such that and . In other words, adding either value to itself a finite number of times will eventually outweigh the other. Any pair , for which this property holds of and is called comparable. Note that we're taking the absolute value because we care about magnitude and not sign here: of course any positive number of people enjoying a sunset is better than any positive number of people being tortured, but the question remains whether there is some number of sunset-views which would outweigh a year of torture (in the sense that a world with additional sunset watchers and 1 additional year of torture would be preferable to a world with no more of either).

Another way to think about comparability is that both values are relevant to an analysis of world state preference: a duster does not need to know how many dust specks are in world 1 or world 2 if they know that world 2 has more torture than world 1, because the preference is already decided by the torture differential.

Convicted dusters now face a dilemma: a linear, Archimedean utility function necessarily yields some number of dust specks to which torture is preferable. One resolution is to reject linearity: very smart people have discussed this at length. There is another option though which, as far as the author can tell, hardly ever gets proper attention: rejecting Archimedean utilities. This is what I'll explore in the post.

## Comparability

Comparability defines an equivalence class on utilities of world states (and their utilities).

• Any world state is comparable to itself (reflexivity),
• if state 1 is comparable to state 2, then state 2 is comparable to state 1 (symmetry), and
• if state 1 is comparable to state 2 and state 2 is comparable to state 3, then state 1 is comparable to state 3 (transitivity).

These properties follow immediately from the definition of comparability provided above. Now we have a partition of world states into comparability classes. Note that the comparability classes do not form a vector space: 1 torture + 1 dust speck is comparable to one torture, but their difference, 1 dust speck, is not comparable to either.

As I said before, I don't care if you disagree about dust specks. If you think there are any pair of utilities which are in some way qualitatively distinct or otherwise incomparable (take your pick from: universe destruction, death, having corn stuck between one's teeth, the simple joy of seeing a friend, etc--normative claims about which utilities are comparable is outside the scope of this post), then there is more than one comparability class and the utility function needs a non-Archimedean co-domain to reflect this.

If you staunchly believe that every pair of utilities is comparable, that's fine--there will be only one comparability class containing all world states, and the construction below will boringly return your existing Archimedean utility function.

We can assign a total ordering (severity) to our collection of comparability classes: means that, for any world states in classes , respectively, for all . It is clear from the construction of comparability classes that this does not depend on which representatives are chosen. Note that severity does not distinguish between severe positive and severe negative utilities: the classes are defined by magnitude and not sign. Thus, the framework is agnostic on such matters as negative utilitarianism: if one believes that reducing disutility of various forms takes priority over expanding positive utility, then that will be reflected in the severity levels of these comparability classes--perhaps a large portion of the most severe classes will be disutility-reduction based, and only after that do we encounter classes prioritizing positive utility world states.

Then we can think of utilities as being vectors, where each component records the amount of utility of a given severity level. We'll write the components with most severe first (since those are the most significant ones for comparing world states). Thus, if an agent believes that there are exactly two comparability classes:

• a severe class which contains e.g. 50 years of torture, and
• a milder class containing e.g. being hit with a dust speck,

then world A with 1 person being tortured and 100 dust-speckers would have utility . Choosing between this and world B with no torture and 3^^^3 dust speckers is now easy: world B has utility , and since the first component is greater in world B, it is preferable no matter what the second component is. If this seems unreasonable to the reader, it is not a fault of the mathematics but the normative assumption that dust specks are incomparable to torture--if comparability classes are constructed correctly, then this sort of lexicographic ordering on utilities necessarily follows.

This is what we refer to as a pseudograded vector space: the utility space consists of -dimensional vectors, which are I-pseudograded by severity, meaning that for any vector we can identify its severity as the comparability class of the first non-zero component. Equivalently, utility values are functions from I to , and thus utility functions have type (where is the world state space) (this interpretation becomes important if we would like to consider the possibility of infinitely many comparability classes).

Formally, the utility vector space admits a grading function which gives the index of the first (most severe) non-zero coordinate of the input vector. In our example, and . In general, if we're comparing two world states with different severity gradings, we need only check which world has greater severity, and then consult the sign of (that's a function evaluation, not a product!) to determine if it's preferable to the alternative (the sign will be non-zero by definition of ). More generally, we compare worlds by checking the sign of . Note that two utility vectors satisfy exactly when , so the lexicographic ordering is determined fully by which utilities are preferable to 0, the utility of an empty world. We will refer to such utility vectors as positive, keeping in mind that this only means that the most severe non-zero coefficient is positive and not that the whole vector is.

Expected utility computations work exactly as in Archimedean utility functions: averages are performed component-wise. Thus, existing work with utilities and decision theory should import easily to this generalization.

Observe that satisfies ultrametric-like conditions (recalling those found in -adic norms):

• For any non-zero and utility vector ,
• For any pair of utility vectors , we have with equality if

One may reasonably be concerned about the well-definedness of if they believe : isn't it possible that I is not well-ordered under severity, meaning that some vectors may not have a first non-zero element? [Technically we are considering the well-orderedness of the reverse ordering on , since we want to find the most severe non-zero utility and not the least severe].

Consider for instance the function defined on the positive reals: for all its smoothness, it has no least input with a non-zero output, and what's worse the output oscillates signs infinitely many times approaching such that we aren't prepared to weigh a world state with such a utility against an empty world with utility 0. Luckily, our setting is actually a special case of the Hahn Embedding Theorem for Abelian Ordered Groups, which guarantees that any utility vector can only have non-zero entries on a well-ordered set of indices (thanks to Sam Eisenstat for pointing this out to me), and thus is necessarily well-defined. We'll refer to this vector space as : the space of all functions which each only have non-zero values on a well-ordered subset of , and thus in particular all have a first non-zero value.

## Basis Independence

Working with vector spaces, we often think of the vectors as lists of (or functions from the index set to) numbers, but this is a convenient lie--in doing so, we are fixing a basis with which the vector space was not intrinsically equipped. Abstractly, a vector space is coordinate-free, and we merely pick bases for ease of visualization and computation. However, it is important for our purposes here that the lexicographic ordering on utility functions is basis-independent", for a suitable notion of basis (otherwise, our world state preference would depend on arbitrary choices made in representations of the utility space). In fact, classical graded vector spaces have distinguished homogeneous elements of grading i (e.g. polynomials which only have degree i monomials, rather than mixed polynomials with max degree i, in the graded vector space of polynomials), but this is too much structure and would be unnatural to apply to our setting, hence pseudograded vector spaces.

The standard definition of basis (a linearly independent set of vectors which spans the space) isn't sufficient here, since we have extra structure we'd like to encode in our bases--namely, the severity grading. Instead, we'll define a graded basis to be a choice of one positive utility vector for each grading (i.e., a map such that --in categorical terms, is a section of ).

Then suppose Alice and Bob agree that there are two severity levels, and even agree on which world states fall into each, but have picked different graded bases. Letting X=50 years of torture, and Y=1 dust speck in someone's eye, perhaps Alice's basis has and , but for some reason Bob thinks the sensible choice of representatives is and --he agrees with Alice that , but factors the world state differently (while Bob may seem slightly ridiculous here, there are certainly world states with multiple natural factorizations).

Then Alice's utility for world A in her basis is and her , while Bob writes (since ) and . Note that they will both prefer world B, since the change in graded basis did not affect the lexicographic ordering on utilities. Specifically, the transformation from Alice's coordinates to Bob's is induced by left multiplication by the matrix

We observe two key facts about :

• is lower triangular (all entries above the main diagonal are 0), and
• has positive diagonal entries.

These will be true of any transformations between graded bases of the same utility space: lower triangularity follows from the ultrametric property of the grading , and positive diagonals follow from our insistence that the graded basis vectors all have positive utilities. In general (when ), it won't make sense to view as a matrix, but we will have a linear transformation operator with the same properties.

Recalling that the ordering on utilities is determined by the positive cone of utilities (since iff ), we simply observe that lower triangular linear transformations with positive diagonal entries will preserve the positive cone of : if the first non-zero entry of is positive, then will also have 0 for all entries before index (since lower triangularity means changes only propagate downstream) and will scale by the positive diagonal element, returning a positive utility. Thus, our ordering is independent of graded basis and depends only upon the severity classes and comparisons within them.

## What's the point?

I think many LW folk (and almost all non-philosophers) are firm dusters and have felt dismayed at their inability to justify this within Archimedean utility theory, or have thrown up their hands and given up linearity believing it to be the only option. While we will never encounter a world with people in it to be dusted, it's important to understand the subtleties of how our utility functions actually work, especially if we would like to use them to align AI who will have vastly more ability than us to consider many small utilities and aggregate them in ways that we don't necessarily believe they should.

Moreover, the framework is very practical in terms of computational efficiency. It means that choosing the best course of action or comparing world states requires one to only have strong information on the most severe utility levels at stake, rather than losing their mind (or overclocking their CPU) trying to consider all the tiny utilities which might add up. Indeed, this is how most people operate in daily life; while some may counter that this is a failure of rationality, perhaps it is in fact because people understand that there are some tiny impacts which can never accumulate to more importance than the big stuff, and it can't all be chalked up to scope neglect.

Finally, completely aside from all ethical questions, I think there's some interesting math going on and I wanted people to look at it. Thanks for looking at it!

This is my first blog post and I look forward to feedback and discussion (I'm sure there are many issues here, and I hope that I will not be alone in trying to solve them). Thanks to everyone at MSFP with whom I talked about this--even if you don't think you gave me any ideas, being organic rubber ducks was very productive for my ability to formulate these thoughts in semi-coherent ways.

Frontpage

# 12

New Comment

Apologies if this is not the discussion you wanted, but it's hard to engage with comparability classes without a framework for how their boundaries are even minimally plausible.

Would you say that all types of discomfort are comparable with higher quantities of themselves? Is there always a marginally worse type of discomfort for any given negative experience? So long as both of these are true (and I struggle to deny them) then transitivity seems to connect the entire spectrum of negative experience. Do you think there is a way to remove the transitivity of comparability and still have a coherent system? This, to me, would be the core requirement for making dust specks and torture incomparable.

I agree that delineating the precise boundaries of comparability classes is a uniquely challenging task. Nonetheless, it does not mean they don't exist--to me your claim feels along the same lines as classical induction "paradoxes" involving classifying sand heaps. While it's difficult to define exactly what a sand heap is, we can look at many objects and say with certainty whether or not they are sand heaps, and that's what matters for living in the world and making empirical claims (or building sandcastles anyway).

I suspect it's quite likely that experiences you may be referring to as "higher quantities of themselves" within a single person are in fact qualitatively different and no longer comparable utilities in many cases. Consider the dust specks: they are assumed to be minimally annoying and almost indetectable to the bespeckèd. However, if we even slightly upgrade them so as to cause a noticeable sting in their targeted eye, they appear to reach a whole different level. I'd rather spend my life plagued by barely noticeable specks (assuming they have no interactions) than have one slightly burn my eyeball.

Theron Pummer has written about this precise thing in his paper on Spectrum Arguments, where he touches on this argument for "transitivity=>comparability" (here notably used as an argument against transitivity rather than an argument for comparability) and its relation to 'Sorites arguments' such as the one about sand heaps.

Personally I think the spectrum arguments are fairly convincing for making me believe in comparability, but I think there's a wide range of possible positions here and it's not entirely obvious which are actually inconsistent. Pummer even seemed to think rejecting transitivity and comparability could be a plausible position and that the math could work out in nice ways still.

• The thing you called "pseudograding" is normally called "filtration".

• In practice, because of the complexity of the world, and especially because of the presence of probabilistic uncertainty, an agent following a non-Archimedean utility function will always consider only the component corresponding to the absolute maximum of , since there will never be a choice between A and B such that these components just happen to be exactly equal. So it will be equivalent to an Archimedean agent whose utility is this worst component. (You can have an without an absolute maximum but I don't think it's possible to define any reasonable utility function like that, where by "reasonable" I roughly mean that, it's possible to build some good theory of reinforcement learning out of it.)

The thing you called "pseudograding" is normally called "filtration".

Ah, thanks! I knew there had to be something for that, just couldn't remember what it was. I was embarrassed posting with a made-up word, but I really did look (and ask around) and couldn't find what I needed.

...Although, reading the definition, I'm not sure it's exactly the same...the severity classes aren't nested, and I think this is probably an important distinction to the conceptual framing, even if the math is equivalent. If I start with a filtration proper, I need to extract the severity classes in a way that seems slightly more convoluted than what I did.

In practice, because of the complexity of the world, and especially because of the presence of probabilistic uncertainty, an agent following a non-Archimedean utility function will always consider only the component corresponding to the absolute maximum of I, since there will never be a choice between A and B such that these components just happen to be exactly equal. So it will be equivalent to an Archimedean agent whose utility is this worst component.

See my response to Dacyn.

If I understand what you do correctly, the severity classes are just the set differences , where is the filtration. I think that you also assume that the quotient is one-dimensional and equipped with a choice of "positive" direction.

Yes! This is all true. I thought set differences of infinite unions and quotients would only make the post less accessible for non-mathematicians though. I also don't see a natural way to define the filtration without already having defined the severity classes.

I've tried intuitive approaches to thinking along these lines which failed so it's really nice to see a serious approach. I see this as key anti-moloch tech and want to use it to think about rivalrous and non-rivalrous goods.

Firstly I will focus on the most wrong part. The claim that non archimedian utilities are more efficient. In the real world there aren't 3^^^3 little impacts to add up. If the number of little impacts is a few hundred, and they are a trillion times smaller, then the little impacts make up less than a billionth of your utility. Usually you should be using less than a billionth of your compute to deal with them. For agents without vast amounts of compute, this means forgetting them altogether. This can be understood as an approximation strategy to maximize a normal archimedian utility.

There is also the question of different severity classes. If we can construct a sliding scale between specks and torture then we find the need for a weird cut off point, like a broken arm being in a different severity class than a broken toe.

Intuitively speaking broken arm and broken toe are comparable. Broken arm is worse, broken toe is still bad. I'd rather get a broken arm than torture for 50 years, or even torture for 1 day.

For sliding scale of severities: there's a very difficult to compute but intuitively satisfying emphasis that can be imposed so the scale can't slide. It's the idea of "bouncing back". If you can't bounce back from an action that imparts negative utility, it forms a distinct class of utilities. Compare broken arm with torn-off toe. Compare both of those to 50 years of torture.

P.S: If you're familiar with Taleb's idea of "antifragility", that's the notion I'm basing these on.

The idea is that we can take a finite list of items like this

Torture for 50 years

Torture for 40 years

...

Torture for 1 day

...

Broken arm

Broken toe

...

Papercut

Sneeze

Dust Speck

Presented with such a list you must insist that two items on this list are incomparable. In fact you must claim that some item is incomparably worse than the next item. I don't think that any number of broken toes is better than a broken arm. A million broken toes is clearly worse. Follow this chain of reasoning for each pair of items on the list. Claiming incomparably is a claim that no matter how much I try to subdivide my list, one item will still be infinitely worse than the next.

The idea of bouncing back is also not useful. Firstly it isn't a sharp boundary, you can mostly recover but still be somewhat scarred. Secondly you can replace an injury with something that takes twice as long to bounce back from, and they still seem comparable. Something that takes most of a lifetime to bounce back from is comparable to something that you don't bounce back from. This breaks if you assume immortality, or that bouncing back 5 seconds before you drop dead is of morally overwhelming significance, such that doing so is incomparable to not doing so.

Broken arms vs toes: I agree that any number of broken toes wouldn't be better than a broken arm. But that's the point, these are _comparable_.

Incomparable breaks occur where you put the ellipses in your list. Torture for 40-50 years vs torture for 1 day is qualitatively distinct. I imagine a human being can bounce back from torture for 1 day, have scars but manage to prosper. That would be hellishly more difficult with torture for 40 years. We could count torture by day, 1-(365*40) and there would be a point of no return there. A duration of torture a person can't bounce back. It would depend on the person, what happens during and after etc, which is why it's not possible to compute that day. That doesn't mean we should ignore how humans work.

Here's the main beef I have with Dust Specks vs Torture: Statements like "1 million broken toes" or "3^^^3 dust specks" disregard human experience. That many dust specks on one person is torture. One on each is _practically nothing_. I'm simulating people experiencing these, and the result I arrive at is this; choose best outcome from (0 utils * 3^^^3) vs (-3^^^3 utils). This is easy to answer.

You may say "but 1 dust speck on a person isn't 0 utils, it's a very small negative utility" and yes, technically you're correct. But before doing the sum over people, take a look at the people. *Distribution matters.*

Humans don't work like linear sensory devices. Utility can't work linearly as well.

What if I make each time period in the "..." one nanosecond shorter than the previous.

You must believe that there is some length of time, t>most of a day, such that everyone in the world being tortured for t-1 nanosecond is better than one person being tortured for t.

Suppose there was a strong clustering effect in human psychology, such that less than a week of torture left peoples minds in one state, and more than a week left them broken. I would still expect the possibility of some intermediate cases on the borderlines. Things as messy as human psychology, I would expect there to not be a perfectly sharp black and white cutoff. If we zoom in enough, we find that the space of possible quantum wavefunctions is continuous.

There is a sense in which specs and torture feel incomparable, but I don't think this is your sense of incomparability, to me it feels like moral uncertainty about which huge number of specs to pick. I would also say that "Don't torture anyone" and "don't commit attrocities based on convoluted arguments" a good ethical injunction. If you think that your own reasoning processes are not very reliable, and you think philosophical thought experiments rarely happen in real life, then implementing the general rule "If I think I should torture someone, go to nearest psych ward" is a good idea. However I would want a perfectly rational AI which never made mistakes to choose torture.

we find the need for a weird cut off point, like a broken arm

For the cut-off point on a broken arm, I recommend the elbow [not a doctor].

Suppose there was a strong clustering effect in human psychology, such that less than a week of torture left peoples minds in one state, and more than a week left them broken. I would still expect the possibility of some intermediate cases on the borderlines. Things as messy as human psychology, I would expect there to not be a perfectly sharp black and white cutoff. If we zoom in enough, we find that the space of possible quantum wavefunctions is continuous.

I agree! You've made my point for me: it is precisely this messiness which grants us continuity on average. Some people will take longer than others to have qualitatively incomparably damaging effects from torture, and as such the expected impact of any significant torture will have a component on the severity level of 50 years torture. Hence, comparable (on expectation).