When thinking about optimisation processes it is seductive to think in information-theoretic terms.

Is there some useful measure^{[1]} of 'optimisation' we can derive from utility functions or preference orderings, just as Shannon derived 'information' from probability distributions? Could there be a 'mathematical theory of optimisation' that is analogous to Shannon's theory of information? In this post we exhibit negative evidence that this point of view is a fertile direction of inquiry.

In the last post we reviewed proposals in that direction, most notably Yudkowsky's original idea using preference orderings, and suggested some informal desiderata. In this post we state our desiderata formally, and show that they can't all be satisfied at once. We exhibit a new proposal from Scott Garrabrant which relaxes one desideratum, and revisit the previous proposals to see which desiderata they satisfy.

Setup

Recall our setup: we're choosing an action from a set A to achieve an outcome in a set Ω. For simplicity, we assume that Ω is finite. Denote the set of probability distributions on Ωby ΔΩ. We have a default distribution p∈ΔΩ, which describes the state of affairs before we optimise, or in a counterfactual world where we don't optimise, and action distributions pa∈ΔΩ for each a∈A, which describe the state of affairs if we do. Our preferences are described by a utility function u:Ω→R. Let U denote the set of utility functions.

In the previous post we considered random variables OP(p,u)(x), which measure the optimisation entailed by achieving some outcome x, given a utility function u and base distribution p. We then took an expectation over pa to measure the optimisation entailed by achieving some distributionover outcomes, i.e. we defined OP(p,pa,u)=EpaOP(p,u)(x).

In this post we state our desiderata directly over OP(p,pa,u) instead. For more on this point see the discussion of the convex-linearity desideratum below.

Desiderata

Here are the desiderata we originally came up with for OP:ΔΩ×ΔΩ×U→R∪{−∞,∞}. They should hold for all p,pa,pb∈ΔΩ and for all u∈U. Explanations below.

(Continuity) OP is continuous^{[2]} in all its arguments.

(Invariance under positive scaling). OP(p,pa,αu)=OP(p,pa,u) for all α∈R>0.

(Invariance under translation). OP(p,pa,u)=OP(p,pa,u+β) for all β∈R.

(Zero for unchanged expected utility). OP(p,pa,u)=0 whenever Ep(u)=Epa(u).

(Convex-linearity). OP(p,λpa+(1−λ)pb,u)=λOP(p,pa,u)+(1−λ)OP(p,pb,u) for all λ∈[0,1].

(Interesting and not weird). See below.

Continuity just seems reasonable.

We want invariance under positive scaling and invariance under translation because a von Neumann-Morgernstern utility function is only defined up to an equivalence class [u]=[αu+β] for α∈R>0 and β∈R (we denote this equivalence relation by ∼ in the remainder of the post). One of our motivations for this whole endeavour is to be able to talk about how much a utility function is being optimised without having to choose a specific α and β.

The combination of zero for unchanged expected utility and strict mononicity means that OP(p,pa,u) follows the sign of Epa(u)−Ep(u). Increases in expected utility count as positive optimisation, decreases count as negative optimisation, and when expected utility is unchanged no optimisation has taken place.

Convex-linearity holds if and only if OP(p,pa,u) can be rewritten as the expectation under pa of an underyling measure of the optimisation of outcomes x rather than distributions pa. This is intuitively desirable since we're pursuing an analogy with information theory, where OP(p,pa,u) corresponds to the entropy of a random variable, and OP(p,u)(x) corresponds to the information content of its outcomes. This is the desideratum that Scott's proposal violates in order to satisfy the rest.

Interesting and not weird is an informal catch-all desideratum.

Not weird is inspired by the problem we ran across in the previous post when trying to redefine Yudkowsky's proposed optimisation measure for utility functions instead of preference orderings: you could make OP arbitrarily low by adding some x to Ω with zero probability under both p and pa and large negative utility. This kind of brittleness counts against a proposed OP.

Interesting is the most important desideratum and the hardest to formalise. OP should be a derivative of utility functions thatmight plausibly lead us to new insights that are much harder to access by thinking about utility functions directly. If we compare to derivatives of probability, it should be more like information than like odds ratios. OPdefinitely shouldn't just recapitulate expected utility.

Impossibility

It turns out our first 6 desiderata sort of just recapitulate expected utility.

In particular, they're equivalent to saying that OP(p,pa,u) just picks out some representative v of each equivalence class of utility functions and reports its expectation under pa. The zero point of the representative must be set so that Ep(v)=0 (and so we'll get different representatives for different p), but other than that, any representative defined continuously in terms of p and u will do.

Proposition 1: Desiderata 1-6 are satisfied if and only if OP(p,pa,u)=Eparep(p,[u]),for some continuous function rep:ΔΩ×U/∼→U with rep(p,[u])∼uandEprep(p,[u])=0for allp∈ΔΩandu∈U.

Proof: See Appendix.

For example, we can translate u by −Ep(u) and rescale by the utility difference between any two points, i.e. set rep(p,[u])=u(x)−Ep(u)u(x1)−u(x2) for some fixed x1,x2∈Ω^{[3]}. The translation gets us the right zero point, and the scaling ensures the function is well defined, i.e. it sends all equivalent u to the same representative.

OP(p,pa,u)=Epa(u(x)−Ep(u)u(x1)−u(x2))=Epa(u)−Ep(u)u(x1)−u(x2) satisfies desiderata 1-6. But it's not very interesting! It's just the expectation of a utility function.

Not all possibilities for rep(p,[u]) are of this simple form, and as stated it remains possible that a well-chosen rep(p,[u]), with a scaling factor that depends more subtly on p and u, could lead to an interesting and well-behaved optimisation measure. But proposition 1 contrains the search space, and can be seen as moderately strong evidence that any optimisation measure satisfying desiderata 1-6 must violate desideratum 7.

If we take this perspective, perhaps we should think of dropping one of the desiderata. Which one should it be?

New Proposal: Garrabrant

Convex-linearity, according to the following idea, suggested by Scott Garrabrant.

Intuitively, we start with some notion of 'disturbance' - a goal-neutral measure of how much an action affects the world. Then we say that the amount of optimisation that has taken place is the least amount of disturbance necessary to achieve a result at least this good.

We'll use the KL-divergence as our notion of disturbance, but there are other options. Let ⪰u order probability distributions by their expected utility, so p′⪰paif and only if Ep′u(x)≥Epau(x). Then we define OP+SG(p,pa,u)=minp′⪰paDKL(p∣∣p′).

So if it didn't require much disturbance to transform the default distribution p into pa, we won't get much credit for our optimisation. If it required lots of disturbance then we might get more credit - but only if there wasn't some p′ much closer to p which would've been just as good. In that case most of our disturbance was wasted motion.

Notice that Scott's idea only measures positive optimisation: if Epa(u)<Ep(u) then we just get OP+SG(p,pa,u)=0.

To get something which does well on our desiderata, we need to combine it with some simliar way of measuring negative optimisation. One idea is to say that when expected utility goes down as a result of our action, the amount of de-optimisation that has taken place is the least amount of disturbance needed to get back to somewhere as good as you started^{[4]}. Using the KL-divergence as our notion of disturbance we get OP−SG(p,pa,u)=minp′⪰pDKL(pa∣∣p′).

Then we can say OPSG=OP+SG−OP−SG=minp′⪰paDKL(p∣∣p′)−minp′⪰pDKL(pa∣∣p′).

Note that one or both of the two terms will always be zero, depending on whether expected utility goes up or down.

Proposition 2: OPSGsatisfies continuity, invariance under positive scaling, invariance under translation, zero for unchanged expected utility,and strict monotonicity; and does not satisfy convex-linearity.

Proof: See Appendix.

OPSG seems interesting, and not obviously weird, so whether or not you like it might come down the importance you assign to convex-linearity. Remember that in the information theory analogy, the lack of linearity makes OPSG like a notion of entropy without a notion of information. If you didn't like that analogy anyway, this is probably fine.

Previous Proposals

Let's quickly run through how some previously proposed definitions of OP score according to our desiderata. We won't give much explanation of these proposals - see our previous post for that.

Yudkowsky

In our current setup and notation we can write Yudkowsky's original proposal^{[5]}as OPEY(p,pa,u)=−Ex∼pa(log∑u(y)≥u(x)p(y)).

Since this proposal is about preference orderings rather than utility functions, it violates a few of our utility function-centric desiderata.

Proposition 3: OPEYsatisfies invariance under positive scaling, invariance under translation, and convex-linearity; it does not satisfy continuity, zero for unchanged expected utility, or strict monotonicity.

Proof: See Appendix.

OPEY was interesting enough to kick off this whole investigation. How weird it is is open to interpretation.

Yudtility

In the previous post we tried to augment Yudkowsky's proposal to a version sensitive to the size of utility differences betweeen different outcomes, rather than just their order. We can write our idea as OPUY(p,pa,u)=−Ex∼pa(log∑u(y)≥u(x)p(y)(u(y)−u(xw))∑p(y)(u(y)−u(xw))) where xw=argminx∈Ωu(x).

Proposition 4: OPUYsatisfies invariance under positive scaling, invariance under translation, and convex-linearity; it does not satisfy continuity,zero for unchanged expected utility, or strict monotonicity.

Proof: See Appendix.

OPUY also intuitively fails not weird, since as we noted in the previous post, it can be made arbitrarily small by making u(xw) sufficiently low - the existence of a very bad outcome can kill your optimisation power, even if it has zero probability under either the default or achieved distribution.

Altair

In a post from 2012, Alex Altair identifies some desiderata for a measure of optimisation. His setup is slightly different to ours: instead of comparing two distinct distributions p and pa, he imagines one distribution pt varying continously over time, and aims to define instantaneuous OP in terms of the rate of change of Ept(u). He gives the following desiderata:

The sign of OP should be the sign of ddtEpt(u).

Constant positive OP should imply exponentially increasing Ept(u).

The analogue of 1 in our setup is the requirement that the sign of OP should be the sign of Epa(u)−Ep(u). As we mentioned, this is a consequence of zero for unchanged expected utility and strict mononicity. But importantly, strict monotonicity also rules out the degenerate solution of setting OP to 1 if Epa(u)−Ep(u) is positive and −1 if it's negative.

2 seems like the sort of condition should fall out of desiderata along the lines of 'OP should be additive across a sequence of independent optimisation events'. We haven't thought much about the sequential case, but we think it's the most interesting direction to consider in the future.

Altair tentatively suggests defining OP as ddtEpt(u).∣Ept(u)∣. An analogue in our setup would be OPAA(p,pa,u)=Epa(u)−Ep(u)∣Ep(u)∣.

Proposition 5: OPAAsatifies continuity, invariance under positive scaling, zero for unchanged expected utility, strict monotonicity, and convex-linearity. It does not satisfy invariance under translation.

Proof: See Appendix.

But OPAA is not very interesting - it's similar to the OP we got out of Proposition 5.

Future Directions

Comments suggesting new desiderata or proposals, rejecting existing ones, or pointing out mistakes are encouraged. We don't plan to work on this any more for now, so anyone who wants to to take up the baton is welcome to.

That said, there are a few reasons why the basic premise of this project, which is something like, "Let's look for a true name for optimisation defined in terms of utility functions, inspired by information theory," might be misguided:

Utility functions might already be the true name - after all, they do directly measure optimisation, while probability doesn't directly measure information.

The true name might have nothing to do with utility functions - Alex Altair has made the case that it should be defined in terms of preference orderings instead.

Following the information theory analogy might put form over substance - perhaps quests for the true name of a quantity need to be guided by clear cut use cases instead, so you get the meaningful feedback loops necessary to iterate to something interesting.

If you're not put off by these concerns, and want something concrete and mathematical to work on, then we think there's some low hanging fruit in formulating desiderata over a sequence of optimisation events, instead of just one. One of Shannon's desiderata for a measure of information was that the information content of two independent events should be the sum of the information content of the individual events. It seems natural to say something similar for optimisation, but we haven't thought much about how to formulate it.^{[6]}^{[7]}

Appendix: Proofs

Proposition 1

Proposition 1: Desiderata 1-6 are satisfied if and only if OP(p,pa,u)=Eparep(p,[u]),for some continuous function rep:ΔΩ×U/∼→U with rep(p,[u])∼uandEprep(p,[u])=0for allp∈ΔΩandu∈U.

Proof: Forwards direction:

By convex-linearity OP(p,pa,u)=Epaf(p,x,u) for some f:ΔΩ×Ω×U. By the two invariance conditions we have that Epaf(p,x,u1)=Epaf(p,x,u2) for any pa,p∈ΔΩ and u1,u2∈U with u1∼u2. A consequence of this is that f(p,x,u1)=f(p,x,u2) for any x∈Ω, p∈ΔΩ and u1,u2∈U with u1∼u2 (to see this for a given x, just set pa=δx, where δx is the distribution which is 1 at x and 0elsewhere). That means we can well-define a function g:ΔΩ×Ω×U/∼→R by g(p,x,[u])=f(p,x,u). Since U is the set of functions from Ω to R, we can reformulate g as a function (whose name is not yet justified) rep:ΔΩ×U/∼→U, where rep(p,[u]) is defined by rep(p,[u])(x)=g(p,x,[u])=f(p,x,u). Then we get that OP(p,pa,u)=Eparep(p,[u]), as in the statement of the proposition.

To show that rep is continuous, note that by the continuity of OP we have thatlim(p,[u])→(p∗,[u∗])OP(p,δx∗,u)=lim(p,[u])→(p∗,[u∗])rep(p,[u])(x∗) for any sequence (p,[u])→(p∗,[u∗]). Since lim(p,[u])→(p∗,[u∗])OP(p,δx∗),u)=lim(p,[u])→(p∗,[u∗])rep(p,[u])(x∗) and OP(p∗,δx∗,u∗)=rep(p∗,[u∗])(x∗), the continuity of rep follows.

To see that rep(p,[u])∼u, note that by strict monotonicity rep(p,[u]) and uinduce the same ordering of probability distributions by expected utility, so by the von Neumann-Morgernstern utility theorem there exists a positive affine transformation between them. That Eprep(p,[u])=0 follows directly from zero for unchanged expected utility.

Backwards direction:

Continuity follows from continuity of rep. Invariance follows from the fact that rep is defined on U/∼ rather than U. That rep(p,[u])∼u gives us that if Epa(u)<Epb(u) then Eparep(p,[u])<Epbrep(p,[u]), i.e. strict monotonicity, and also that if Epa(u)=Ep(u) then Eparep(p,[u])=Eprep(p,[u]), which since the latter is zero by assumption gives zero for unchanged expected utility. Linearity follows from linearity of expectation.

Proposition 2

OPSGsatisfies continuity, invariance under positive scaling, invariance under translation, zero for unchanged expected utility,and strict monotonicity; and does not satisfy convex-linearity.

Proof: We have continuity since OPSG is the composition of continuous functions. Since we only use the ordering a utility function induces over distributions we get invariance for free. When Epa(u)=Ep(u) both KL-divergences can be minimised to zero by p′=p, so we get zero for unchanged expected utility.

For strict monotonicity we can assume that Epa(u)>Epb(u)≥Ep(u) and show thatOP(p,pa,u)>OP(p,pb,u) (the case where both are Ep(u)≥Epa(u)>Epb(u) is similar, and the case where exactly one is Epa(u)>Ep(u)>Epb(u) is easy). In particular we need to show that minp′⪰paDKL(p∣∣p′)>minp′⪰pbDKL(p∣∣p′), which we can do by taking some qa∈argminp′⪰paDKL(p∣∣p′) and constructing from it some qb with DKL(p∣∣qb)<DKL(p∣∣qa) and qb≻pb. Take any y,z∈Ω with qa(y)>p(y) and qa(z)>p(z), and set qb(y)=qa(y)−ε and qb(y)=qa(y)+ε, with qb(x)=qa(x) for all x∈Ω∖{y,z}. For small ε>0, we get decreased KL-divergence without taking the expected utility below that of pb.

To see that OPSG violates convex-linearity, consider the case where Ω={x1,x2} with u(x1)>u(x2), and Ep(u)<Epa(u). In this case OPSG(p,pa,u)=minp′⪰paDKL(p∣∣p′)=DKL(p∣∣pa). Since KL-divergence is not linear its easy to find a counterexample.

Proposition 3

OPEYsatisfies invariance under positive scaling, invariance under translation, and convex-linearity; it does not satisfy continuity, zero for unchanged expected utility, or strict monotonicity.

Proof: We get invariance by since OPEY only uses the ordering over outcomes implied by a utility function, and convex-linearity since OPEY is an expectation.

OPEY is not continuous in u: look at OP(p,δx,u)=−log∑u(y)≥u(x)p(y) for some x, and consider what happens when we take some z with p(z)>0 and and u(z)<u(x), and increase u(z) all the way to u(x).Zero for unchanged expected utility fails since OPEY is only zero when pa=δxw.

For a counterexample to strict monotonicity, let Ω={x1,x2,x3}, p(x1)=p(x2)=p(x3)=13 (which we will notate as p=[131313]), pa=[010], pb=[12012], and u=[034]. Then pa wins on expected utility but loses on OPEY.

Proposition 4

OPUYsatisfies invariance under positive scaling, invariance under translation, and convex-linearity; it does not satisfy continuity,zero for unchanged expected utility, or strict monotonicity.

Proof: We get invariance by construction, and convex-linearity since OPEY is an expectation.

Continuity fails for the same reason as before. We can reuse the same counterexample as before for strict monotonicity. For a counterexample to zero for unchanged expected utility let Ω={x1,x2,x3}, u=[048], p=[121414] and pa=[381218].

Proposition 5: OPAAsatifies continuity, invariance under positive scaling, zero for unchanged expected utility, strict monotonicity, and convex-linearity. It does not satisfy invariance under translation.

Proof: Left as an exercise to the reader!

^{^}

By measure we mean a standard unit used to express the size, amount, or degree of something, not a probability measure. Alexander voted for yardstick to avoid confusion; Matt vetoed.

^{^}

Since we assume Ω is finite, there is only one reasonable topology on ΔΩ and U, namely the Euclidean topology.

^{^}

When u(x1)=u(x2) we have to interpret OP as negative infinity, zero, or positive infinity, depending on the sign of the numerator.

^{^}

Here we distinguish de-optimisation, by which we mean something like accidental or collateral damage, from disoptimisation - deliberate pessimisation of a utility function. If we are instead interested in interpreting expected utility decreases as disoptimisation, it would be natural to define OP−SG=minp′⪯pDKL(p∣∣p′) i.e. the amount of disoptimisation that has taken place is the least amount of disturbance needed to do even worse.

^{^}

Yudkowsky defined OP as a function of default distribution, outcome, and preference ordering; we've made it a function of default distribution, achieved distribution, and utility function by taking an expectation under the achieved distribution and using the induced preference ordering of the utility function.

^{^}

Related ideas are to consider a sequence of distributions p1,p2,p3 and require something like OP(p1,p3,u)=OP(p1,p2,u)+OP(p2,p3,u), or to get into more exotic operadic compositionality-style axioms like the one in Theorem 5.3 here.

^{^}

Another avenue is to replace convex-linearity with convexity, in which case OP(p,pa,u) might arrive as an infra-expectation of OP(p,x,u) if not an expectation.

Previously:Towards Measures of OptimisationWhen thinking about optimisation processes it is seductive to think in information-theoretic terms.

Is there some useful measure

^{[1]}of 'optimisation' we can derive from utility functions or preference orderings, just as Shannon derived 'information' from probability distributions? Could there be a 'mathematicaltheory of optimisation' that isanalogoustoShannon's theory of information? In this post we exhibitnegative evidencethat this point of view is a fertile direction of inquiry.In the last post we reviewed proposals in that direction, most notably Yudkowsky's original idea using preference orderings, and suggested some informal desiderata. In this post we state our desiderata formally, and show that they can't all be satisfied at once. We exhibit a new proposal from Scott Garrabrant which relaxes one desideratum, and revisit the previous proposals to see which desiderata they satisfy.

## Setup

Recall our setup: we're choosing an action from a set A to achieve an outcome in a set Ω. For simplicity, we assume that Ω is finite. Denote the set of probability distributions on Ωby ΔΩ. We have a default distribution p∈ΔΩ, which describes the state of affairs before we optimise, or in a counterfactual world where we

don'toptimise, and action distributions pa∈ΔΩ for each a∈A, which describe the state of affairs if we do. Our preferences are described by a utility function u:Ω→R. Let U denote the set of utility functions.In the previous post we considered random variables OP(p,u)(x), which measure the optimisation entailed by achieving some

outcomex, given a utility function u and base distribution p. We then took an expectation over pa to measure the optimisation entailed by achieving somedistributionover outcomes,i.e. we defined OP(p,pa,u)=EpaOP(p,u)(x).In this post we state our desiderata directly over OP(p,pa,u) instead. For more on this point see the discussion of the

convex-linearitydesideratum below.## Desiderata

Here are the desiderata we originally came up with for OP:ΔΩ×ΔΩ×U→R∪{−∞,∞}. They should hold for all p,pa,pb∈ΔΩ and for all u∈U. Explanations below.

(Continuity)OP is continuous

^{[2]}in all its arguments.(Invariance under positive scaling).OP(p,pa,αu)=OP(p,pa,u) for all α∈R>0.

(Invariance under translation).OP(p,pa,u)=OP(p,pa,u+β) for all β∈R.

(Zero for unchanged expected utility).OP(p,pa,u)=0 whenever Ep(u)=Epa(u).

(Strict monotonicity).OP(p,pa,u)>OP(p,pb,u) whenever Epa(u)>Epb(u).

Convex-linearity).OP(p,λpa+(1−λ)pb,u)=λOP(p,pa,u)+(1−λ)OP(p,pb,u) for all λ∈[0,1].

(Interesting and not weird).See below.

Continuityjust seems reasonable.We want

invariance under positive scalingandinvariance under translationbecause a von Neumann-Morgernstern utility function is only defined up to an equivalence class [u]=[αu+β] for α∈R>0 and β∈R (we denote this equivalence relation by ∼ in the remainder of the post). One of our motivations for this whole endeavour is to be able to talk about how much a utility function is being optimisedwithouthaving to choose a specific α and β.The combination of

zero for unchanged expected utilityandstrict mononicitymeans that OP(p,pa,u) follows the sign of Epa(u)−Ep(u). Increases in expected utility count as positive optimisation, decreases count as negative optimisation, and when expected utility is unchanged no optimisation has taken place.Convex-linearityholds if and only if OP(p,pa,u) can be rewritten as the expectation under pa of an underyling measure of the optimisation ofoutcomesx rather thandistributionspa. This is intuitively desirable since we're pursuing an analogy with information theory, where OP(p,pa,u) corresponds to the entropy of a random variable, and OP(p,u)(x) corresponds to the information content of its outcomes. This is the desideratum that Scott's proposal violates in order to satisfy the rest.Interesting and not weirdis an informal catch-all desideratum.Not weirdis inspired by the problem we ran across in the previous post when trying to redefine Yudkowsky's proposed optimisation measure for utility functions instead of preference orderings: you could make OP arbitrarily low by adding some x to Ω with zero probability under both p and pa and large negative utility. This kind of brittleness counts against a proposed OP.Interestingis the most important desideratum and the hardest to formalise. OP should be a derivative of utility functions thatImpossibility

It turns out our first 6 desiderata sort of just recapitulate expected utility.

In particular, they're equivalent to saying that OP(p,pa,u) just picks out some representative v of each equivalence class of utility functions and reports its expectation under pa. The zero point of the representative must be set so that Ep(v)=0 (and so we'll get different representatives for different p), but other than that, any representative defined continuously in terms of p and u will do.

Proposition 1:DesiderataOP(p,pa,u)=Eparep(p,[u]),1-6are satisfied if and only iffor some continuous functionrep:ΔΩ×U/∼→Uwithrep(p,[u])∼uandEprep(p,[u])=0for allp∈ΔΩandu∈U.Proof:See Appendix.For example, we can translate u by −Ep(u) and rescale by the utility difference between any two points, i.e. set rep(p,[u])=u(x)−Ep(u)u(x1)−u(x2) for some fixed x1,x2∈Ω

^{[3]}. The translation gets us the right zero point, and the scaling ensures the function is well defined, i.e. it sends all equivalent u to the same representative.OP(p,pa,u)=Epa(u(x)−Ep(u)u(x1)−u(x2))=Epa(u)−Ep(u)u(x1)−u(x2) satisfies desiderata

1-6. But it's not veryinteresting! It's just the expectation of a utility function.Not all possibilities for rep(p,[u]) are of this simple form, and as stated it remains possible that a well-chosen rep(p,[u]), with a scaling factor that depends more subtly on p and u, could lead to an interesting and well-behaved optimisation measure. But

proposition 1contrains the search space, and can be seen as moderately strong evidence that any optimisation measure satisfying desiderata1-6must violate desideratum7.If we take this perspective, perhaps we should think of dropping one of the desiderata. Which one should it be?

## New Proposal: Garrabrant

Convex-linearity,according to the following idea, suggested by Scott Garrabrant.Intuitively, we start with some notion of 'disturbance' - a goal-neutral measure of how much an action affects the world. Then we say that the amount of optimisation that has taken place is the least amount of disturbance necessary to achieve a result at least this good.

We'll use the KL-divergence as our notion of disturbance, but there are other options. Let ⪰u order probability distributions by their expected utility, so p′⪰pa if and only if Ep′u(x)≥Epau(x). Then we define OP+SG(p,pa,u)=minp′⪰paDKL(p∣∣p′).

So if it didn't require much disturbance to transform the default distribution p into pa, we won't get much credit for our optimisation. If it required lots of disturbance then we might get more credit - but only if there wasn't some p′ much closer to p which would've been just as good. In that case most of our disturbance was wasted motion.

Notice that Scott's idea only measures

positiveoptimisation: if Epa(u)<Ep(u) then we just get OP+SG(p,pa,u)=0.To get something which does well on our desiderata, we need to combine it with some simliar way of measuring

negativeoptimisation. One idea is to say that when expected utility goes down as a result of our action, the amount of de-optimisation that has taken place is the least amount of disturbance needed to get back to somewhere as good as you started^{[4]}. Using the KL-divergence as our notion of disturbance we get OP−SG(p,pa,u)=minp′⪰pDKL(pa∣∣p′).Then we can say OPSG=OP+SG−OP−SG=minp′⪰paDKL(p∣∣p′)−minp′⪰pDKL(pa∣∣p′).

Note that one or both of the two terms will always be zero, depending on whether expected utility goes up or down.

Proposition 2:OPSGsatisfiesandcontinuity,invariance under positive scaling,invariance under translation,zero for unchanged expected utility,strict monotonicity; and does not satisfyconvex-linearity.Proof:See Appendix.OPSG seems

interesting, and not obviouslyweird, so whether or not you like it might come down the importance you assign toconvex-linearity. Remember that in the information theory analogy, the lack of linearity makes OPSG like a notion of entropy without a notion of information. If you didn't like that analogy anyway, this is probably fine.## Previous Proposals

Let's quickly run through how some previously proposed definitions of OP score according to our desiderata. We won't give much explanation of these proposals - see our previous post for that.

## Yudkowsky

In our current setup and notation we can write Yudkowsky's original proposal

^{[5]}as OPEY(p,pa,u)=−Ex∼pa(log∑u(y)≥u(x)p(y)).Since this proposal is about preference orderings rather than utility functions, it violates a few of our utility function-centric desiderata.

Proposition 3:OPEYsatisfiesinvariance under positive scaling, invariance under translation,andconvex-linearity; it does not satisfycontinuity,zero for unchanged expected utility, orstrict monotonicity.Proof:See Appendix.OPEY was

interestingenough to kick off this whole investigation. Howweirdit is is open to interpretation.## Yudtility

In the previous post we tried to augment Yudkowsky's proposal to a version sensitive to the size of utility differences betweeen different outcomes, rather than just their order. We can write our idea as OPUY(p,pa,u)=−Ex∼pa(log∑u(y)≥u(x)p(y)(u(y)−u(xw))∑p(y)(u(y)−u(xw))) where xw=argminx∈Ωu(x).

Proposition 4:OPUYsatisfiesinvariance under positive scaling, invariance under translation,andconvex-linearity; it does not satisfycontinuity,zero for unchanged expected utility,orstrict monotonicity.Proof:See Appendix.OPUY also intuitively fails

not weird,since as we noted in the previous post, it can be made arbitrarily small by making u(xw) sufficiently low - the existence of a very bad outcome can kill your optimisation power, even if it has zero probability under either the default or achieved distribution.## Altair

In a post from 2012, Alex Altair identifies some desiderata for a measure of optimisation. His setup is slightly different to ours: instead of comparing two distinct distributions p and pa, he imagines one distribution pt varying continously over time, and aims to define instantaneuous OP in terms of the rate of change of Ept(u). He gives the following desiderata:

The analogue of

1in our setup is the requirement that the sign of OP should be the sign of Epa(u)−Ep(u). As we mentioned, this is a consequence ofzero for unchanged expected utilityandstrict mononicity.But importantly,strict monotonicityalso rules out the degenerate solution of setting OP to 1 if Epa(u)−Ep(u) is positive and −1 if it's negative.2seems like the sort of condition should fall out of desiderata along the lines of 'OP should be additive across a sequence of independent optimisation events'. We haven't thought much about the sequential case, but we think it's the most interesting direction to consider in the future.Altair tentatively suggests defining OP as ddtEpt(u).∣Ept(u)∣. An analogue in our setup would be OPAA(p,pa,u)=Epa(u)−Ep(u)∣Ep(u)∣.

Proposition 5:OPAAsatifiescontinuity, invariance under positive scaling, zero for unchanged expected utility, strict monotonicity,andconvex-linearity.It does not satisfyinvariance under translation.Proof:See Appendix.But OPAA is not very

interesting- it's similar to the OP we got out ofProposition 5.## Future Directions

Comments suggesting new desiderata or proposals, rejecting existing ones, or pointing out mistakes are encouraged. We don't plan to work on this any more for now, so anyone who wants to to take up the baton is welcome to.

That said, there are a few reasons why the basic premise of this project, which is something like, "Let's look for a true name for optimisation defined in terms of utility functions, inspired by information theory," might be misguided:

Utility functions might already be the true name- after all, they do directly measure optimisation, while probability doesn't directly measure information.The true name might have nothing to do with utility functions- Alex Altair has made the case that it should be defined in terms of preference orderings instead.Following the information theory analogy might put form over substance- perhaps quests for the true name of a quantity need to be guided by clear cut use cases instead, so you get the meaningful feedback loops necessary to iterate to something interesting.If you're not put off by these concerns, and want something concrete and mathematical to work on, then we think there's some low hanging fruit in formulating desiderata over a sequence of optimisation events, instead of just one. One of Shannon's desiderata for a measure of information was that the information content of two independent events should be the sum of the information content of the individual events. It seems natural to say something similar for optimisation, but we haven't thought much about how to formulate it.

^{[6]}^{[7]}## Appendix: Proofs

Proposition 1Proposition 1:DesiderataOP(p,pa,u)=Eparep(p,[u]),1-6are satisfied if and only iffor some continuous functionrep:ΔΩ×U/∼→Uwithrep(p,[u])∼uandEprep(p,[u])=0for allp∈ΔΩandu∈U.Proof:Forwards direction:By

convex-linearityOP(p,pa,u)=Epaf(p,x,u) for some f:ΔΩ×Ω×U. By the twoinvarianceconditions we have that Epaf(p,x,u1)=Epaf(p,x,u2) for any pa,p∈ΔΩ and u1,u2∈U with u1∼u2. A consequence of this is that f(p,x,u1)=f(p,x,u2) for any x∈Ω, p∈ΔΩ and u1,u2∈U with u1∼u2 (to see this for a given x, just set pa=δx, where δx is the distribution which is 1 at x and 0 elsewhere). That means we can well-define a function g:ΔΩ×Ω×U/∼→R by g(p,x,[u])=f(p,x,u). Since U is the set of functions from Ω to R, we can reformulate g as a function (whose name is not yet justified) rep:ΔΩ×U/∼→U, where rep(p,[u]) is defined by rep(p,[u])(x)=g(p,x,[u])=f(p,x,u). Then we get that OP(p,pa,u)=Eparep(p,[u]), as in the statement of the proposition.To show that rep is continuous, note that by the

continuityof OP we have thatlim(p,[u])→(p∗,[u∗])OP(p,δx∗,u)=lim(p,[u])→(p∗,[u∗])rep(p,[u])(x∗) for any sequence (p,[u])→(p∗,[u∗]). Since lim(p,[u])→(p∗,[u∗])OP(p,δx∗),u)=lim(p,[u])→(p∗,[u∗])rep(p,[u])(x∗) and OP(p∗,δx∗,u∗)=rep(p∗,[u∗])(x∗), the continuity of rep follows.To see that rep(p,[u])∼u, note that byinduce the same ordering of probability distributions by expected utility, so by the von Neumann-Morgernstern utility theorem there exists a positive affine transformation between them. That Eprep(p,[u])=0 follows directly from

strict monotonicityrep(p,[u]) and uzero for unchanged expected utility.Backwards direction:

Continuityfollows from continuity of rep.Invariancefollows from the fact that rep is defined on U/∼ rather than U. That rep(p,[u])∼u gives us that if Epa(u)<Epb(u) then Eparep(p,[u])<Epbrep(p,[u]), i.e.strict monotonicity, and also that if Epa(u)=Ep(u) then Eparep(p,[u])=Eprep(p,[u]), which since the latter is zero by assumption giveszero for unchanged expected utility. Linearityfollows from linearity of expectation.Proposition 2OPSG

satisfiesandcontinuity,invariance under positive scaling,invariance under translation,zero for unchanged expected utility,strict monotonicity; and does not satisfyconvex-linearity.Proof:We havecontinuitysince OPSG is the composition of continuous functions. Since we only use the ordering a utility function induces over distributions we getinvariancefor free. When Epa(u)=Ep(u) both KL-divergences can be minimised to zero by p′=p, so we getzero for unchanged expected utility.For

strict monotonicitywe can assume that Epa(u)>Epb(u)≥Ep(u) and show thatOP(p,pa,u)>OP(p,pb,u) (the case where both are Ep(u)≥Epa(u)>Epb(u) is similar, and the case where exactly one is Epa(u)>Ep(u)>Epb(u) is easy). In particular we need to show that minp′⪰paDKL(p∣∣p′)>minp′⪰pbDKL(p∣∣p′), which we can do by taking some qa∈argminp′⪰paDKL(p∣∣p′) and constructing from it some qb with DKL(p∣∣qb)<DKL(p∣∣qa) and qb≻pb. Take any y,z∈Ω with qa(y)>p(y) and qa(z)>p(z), and set qb(y)=qa(y)−ε and qb(y)=qa(y)+ε, with qb(x)=qa(x) for all x∈Ω∖{y,z}. For small ε>0, we get decreased KL-divergence without taking the expected utility below that of pb.To see that OPSG violates

convex-linearity, consider the case where Ω={x1,x2} with u(x1)>u(x2), and Ep(u)<Epa(u). In this case OPSG(p,pa,u)=minp′⪰paDKL(p∣∣p′)=DKL(p∣∣pa). Since KL-divergence is not linear its easy to find a counterexample.Proposition 3satisfiesinvariance under positive scaling, invariance under translation,andconvex-linearity; it does not satisfycontinuity,zero for unchanged expected utility, orstrict monotonicity.Proof:We getinvarianceby since OPEY only uses the ordering over outcomes implied by a utility function, andconvex-linearitysince OPEY is an expectation.OPEY is not

continuousin u: look at OP(p,δx,u)=−log∑u(y)≥u(x)p(y) for some x, and consider what happens when we take some z with p(z)>0 and and u(z)<u(x), and increase u(z) all the way to u(x).Zero for unchanged expected utilityfails since OPEY is only zero when pa=δxw.For a counterexample to

strict monotonicity, let Ω={x1,x2,x3}, p(x1)=p(x2)=p(x3)=13 (which we will notate as p=[131313]), pa=[010], pb=[12012], and u=[034]. Then pa wins on expected utility but loses on OPEY.Proposition 4satisfiesinvariance under positive scaling, invariance under translation,andconvex-linearity; it does not satisfycontinuity,zero for unchanged expected utility,orstrict monotonicity.Proof:We getinvarianceby construction, andconvex-linearitysince OPEY is an expectation.Continuityfails for the same reason as before. We can reuse the same counterexample as before forstrict monotonicity.For a counterexample tozero for unchanged expected utilitylet Ω={x1,x2,x3}, u=[048], p=[121414] and pa=[381218].Proposition 5:OPAAsatifiescontinuity, invariance under positive scaling, zero for unchanged expected utility, strict monotonicity,andconvex-linearity.It does not satisfyinvariance under translation.Proof:Left as an exercise to the reader!^{^}By

measurewe meana standard unit used to express the size, amount, or degree of something, not a probability measure. Alexander voted foryardstickto avoid confusion; Matt vetoed.^{^}Since we assume Ω is finite, there is only one reasonable topology on ΔΩ and U, namely the Euclidean topology.

^{^}When u(x1)=u(x2) we have to interpret OP as negative infinity, zero, or positive infinity, depending on the sign of the numerator.

^{^}Here we distinguish

de-optimisation,by which we mean something like accidental or collateral damage, fromdisoptimisation- deliberate pessimisation of a utility function. If we are instead interested in interpreting expected utility decreases asdisoptimisation,it would be natural to define OP−SG=minp′⪯pDKL(p∣∣p′) i.e. the amount of disoptimisation that has taken place is the least amount of disturbance needed to do even worse.^{^}Yudkowsky defined OP as a function of default distribution, outcome, and preference ordering; we've made it a function of default distribution, achieved distribution, and utility function by taking an expectation under the achieved distribution and using the induced preference ordering of the utility function.

^{^}Related ideas are to consider a sequence of distributions p1,p2,p3 and require something like OP(p1,p3,u)=OP(p1,p2,u)+OP(p2,p3,u), or to get into more exotic operadic compositionality-style axioms like the one in Theorem 5.3 here.

^{^}Another avenue is to replace

convex-linearitywithconvexity, in which case OP(p,pa,u) might arrive as an infra-expectation of OP(p,x,u) if not an expectation.