We would like a mathematical theory which characterises the intuitive notion of ‘optimisation’.

Before Shannon introduced his mathematical theory of communication, the concept of ‘information’ was vague and informal. Then Shannon devised several standardised measures of it, with useful properties and clear operational interpretations. It turned out the concept of information is universal, in that it can be quantified on a consistent scale (bits) across various contexts.

Could something similar be true for ‘optimisation’?

In this post we review a few proposed ways to measure optimisation power and play around with them a bit.

Our general setup will be that we’re choosing actions from a set to achieve an outcome in a set . We have some beliefs about how our actions will affect outcomes in the form of a probability distribution for each . We also have some preferences, either in the form of a preference ordering over outcomes, or a utility function over outcomes. We will also assume there is some default distribution , which we can interpret either as our beliefs about if we don’t act, or if we take some default action[1].


Yudkowsky’s proposed definition just makes use of a preference ordering over . To measure the optimisation power of some outcome , we count up all the outcomes which are at least as good as , and divide that number by the total number of possible outcomes. It’s nice to take a negative log to turn this fraction into a number of bits: If I achieve the second best outcome out of eight, that's bits of optimisation power.

If the outcome space is infinite, then we can't count the number of outcomes at least as good as the one we got, so we need a measure to integrate over. If we make use of our default probability distribution here, the resulting quantity has a nice interpretation. is just : the default probability of doing as well as we did. Since we're always assuming we've got a default distribution, we might as well define OP like this even in the finite-domain case. Again we’ll take a log to get Now 2 bits of optimisation power means the default probability of doing this well was .

So far we’ve just been thinking about the optimisation power of achieving a specific outcome. We can define the optimisation power of an action as the expected optimisation power under the distribution it induces over outcomes: The above definitions just make use of a preference ordering. If we do have a utility function then we’d like our definitions to make use of that too. Intuitively, achieving the second best outcome out of three should constitute more optimisation power in a case where it’s almost as good as the first and much better than the third, compared to a case where it’s only slightly better than the third and much less good than the first[2].

Analogously to how we previously asked ‘what fraction of the default probability mass is on outcomes at least as good as this one?’ we could try to ask ‘what fraction of the default expected utility comes from outcomes at least as good as this one?’.

But making use of utility functions in the above definition is tricky. Recall that utility functions originate from the Von Neumann-Morgenstern theorem, which says that if an agent choosing between probabilistic mixtures of options satisfies some weak rationality criteria then it acts as if it maximises expected utility according to a utility function . The utility function produced by the VNM-theorem is only defined up to positive affine transformations, meaning that the utility function , for any and , equally well represents the tradeoffs between outcomes the agent is willing to take. Another way of looking at this is that only ratios of differences in utility are real. Therefore any definition of OP which changes when you multiply your utilities by a positive scalar or add a constant is very questionable. Alex Altair ran into this problem when trying to define OP in terms of the rate of change of utility.

One way we considered to get around this is to measure the utility of every outcome relative to the worst one. Let and define

In words, this asks 'what fraction of the default extra expected utility on top of the minimum comes from outcomes at least this good?'.

This is invariant to translating and rescaling utility. Unfortunately it has a problem of its own - it’s sensitive to our choice of . By adding some made up element to with large negative utility and zero probability of occurring, we can make OP arbitrarily low. In that case basically all of the default relative expected utility comes from avoiding the worst outcome, which is guaranteed, so you don’t get any credit for optimising.


An alternative approach to measuring optimisation in bits comes from John’s Utility Maximisation = Description Length Minimisation. In the language of our setup, John shows that you can take any utility function [3], and map it to a probability distribution , such that maximising expected utility with respect to is equivalent to minimising cross-entropy with respect to .

At first we weren’t sure of the precise sense of equivalence John meant, but one way to state it is that the interval scale over actions which you obtain by considering is precisely the same as the interval scale you get by considering . This will become very explicit with a bit of rejigging.

John defines by taking a softmax: . But we’ll keep working in base 2 and write instead[4].

So we get that

The term on the right doesn't depend on or , so what this means is that for some constant . In other words, this switch from maximising expected utility to minimising cross-entropy is just like translating your utility function by a constant and carrying on as you were.

Since utility functions are only defined up to translation anyway, this means the sense of equivalence is as strong as can be, but it does dispel the impression that this is a useful way to measure optimisation power in bits. The number of extra 'bits' of optimisation power that some action has compared to is precisely the number of extra expected utilons - and that means that scaling your utility function scales your optimisation power by the same amount.


We would like a measure of the amount of optimisation taking place which makes use of utility functions, which is invariant under postive affine transformations of those utility functions, and which isn't too sensitive to strange things like the utility of the worst outcome. The search continues.

  1. Or even before we act. ↩︎

  2. See this Stuart Armstrong post for related discussion. ↩︎

  3. Where is finite. ↩︎

  4. As an aside: notice that multiplying the utility function by a positive constant will change the resulting softmax distributions to be more sharply peaked at the maxima. ↩︎

New Comment
4 comments, sorted by Click to highlight new comments since:

You might be interested in some of my open drafts about optimization;

One distinction that I pretty strongly hold as carving nature at its joint is (what I call) optimization vs agents. Optimization has no concept of a utility function, and it just about the state going up an ordering. Agents are the thing that has a utility function, which they need for picking actions with probabilistic outcomes.

Nice, I'd read the first but didn't realise there were more. I'll digest later.

I think agents vs optimisation is definitely reality-carving, but not sure I see the point about utility functions and preference orderings. I assume the idea is that an optimisation process just moves the world towards states, but an agent tries to move the world towards certain states i.e. chooses actions based on how much they move the world towards certain states, so it make sense to quantify how much of a weighting each state gets in its decision-making. But it's not obvious to me that there's not a meaningful way to assign weightings to states for an optimisation process too - for example if a ball rolling down a hill gets stuck in the large hole twice as often as it gets stuck in the medium hole and ten times as often as the small hole, maybe it makes sense to quantify this with something like a utility function. Although defining a utility function based on the typical behaviour of the system and then trying to measure its optimisation power against it gets a bit circular.

Anyway, the dynamical systems approach seems good. Have you stopped working on it?

What about optimisation power of  as a measure of outcome that have utility greater than the utility of 

Let  be the set of outcome with utility greater than  according to utility function :

The set    is invariant under translation and non-zero rescaling of the utility function  and we define the optimisation power of the outcome ' according to utility function  as:

This does not suffer from comparing w.r.t a worst case and seem to satisfies the same intuition as the original OP definition while referring to some utility function.

This is in fact the same measure as the original optimisation power measure with an order given by the utility function

I feel like the "obvious" thing to do is to ask how rare (in bits) the post-opitimization EV is according to the pre-optimization distribution. Like, suppose that pre-optimization my probability distribution over utilities I'd get is normally distributed, and after optimizing my EV is +1 standard deviation. Probability of doing that well or better is 0.158, which in bits is 2.65 bits.

Seems indifferent to affine transformation of the utility function, adding irrelevant states, splitting/merging states, etc. What are some bad things about this method?