Why Not Subagents?

David Lorell

55 Why Not Subagents?

22nd Jun 2023

17 min read

55

Alternative title for economists: Complete Markets Have Complete Preferences

The justification for modeling real-world systems as “agents” - i.e. choosing actions to maximize some utility function - usually rests on various coherence theorems. They say things like “either the system’s behavior maximizes some utility function, or it is throwing away resources” or “either the system’s behavior maximizes some utility function, or it can be exploited” or things like that. [...]
Now imagine an agent which prefers anchovy over mushroom pizza when it has anchovy, but mushroom over anchovy when it has mushroom; it’s simply never willing to trade in either direction. There’s nothing inherently “wrong” with this; the agent is not necessarily executing a dominated strategy, cannot necessarily be exploited, or any of the other bad things we associate with inconsistent preferences. But the preferences can’t be described by a utility function over pizza toppings.

… that’s what I (John) wrote four years ago, as the opening of Why Subagents?. Nate convinced me that it’s wrong: incomplete preferences do imply a dominated strategy. A system with incomplete preferences, which can’t be described by a utility function, can contract/precommit/self-modify to a more-complete set of preferences which perform strictly better even according to the original preferences.

This post will document that argument.

Epistemic Status: This post is intended to present the core idea in an intuitive and simple way. It is not intended to present a fully rigorous or maximally general argument; our hope is that someone else will come along and more properly rigorize and generalize things. In particular, we’re unsure of the best setting to use for the problem setup; we’ll emphasize some degrees of freedom in the High-Level Potential Problems (And Potential Solutions) section.

Context

The question which originally motivated my interest was: what’s the utility function of the world’s financial markets? Just based on a loose qualitative understanding of coherence arguments, one might think that the inexploitability (i.e. efficiency) of markets implies that they maximize a utility function. In which case, figuring out what that utility function is seems pretty central to understanding world markets.

In Why Subagents?, I argued that a market of utility-maximizing traders is inexploitable, but not itself a utility maximizer. The relevant loophole is that the market has incomplete implied preferences (due to path dependence). Then, I argued that any inexploitable system with incomplete preferences could be viewed as a market/committee of utility-maximizing subagents, making utility-maximizing subagents a natural alternative to outright utility maximizers for modeling agentic systems.

More recently, Nate counterargued that a market of utility maximizers will become a utility maximizer. (My interpretation of) his argument is that the subagents will contract with each other in a way which completes the market’s implied preferences. The model I was previously using got it wrong because it didn’t account for contracts, just direct trade of goods.

More generally, Nate’s counterargument implies that agents with incomplete preferences will tend to precommit/self-modify in ways which complete their preferences.

… but that discussion was over a year ago. Neither of us developed it further, because it just didn’t seem like a core priority. Then the AI Alignment Awards contest came along, and an excellent entry by Elliot Thornley proposed that incomplete preferences could be used as a loophole to make the shutdown problem tractable. Suddenly I was a lot more interested in fleshing out Nate’s argument.

To that end, this post will argue that systems with incomplete preferences will tend to contract/precommit in ways which complete their preferences.

The Pizza Example

As a concrete example of a system of two subagents with incomplete preference, suppose that John and David have different preferences for pizza toppings, and need to choose one together. We both agree that cheese (C) is the least-preferred default option, and sausage (S) is the best. But in between, John prefers pepperoni > mushroom > anchovy (because John lacks taste), while David prefers anchovy > pepperoni > mushroom (because David is a heathen). Specifically, our utilities are:

	David's Utility	John's Utility
Cheese	0	0
Mushroom	1	2
Pepperoni	2	3
Anchovy	3	1
Sausage	4	4

Or, visually,

The arrows highlight pareto improvements aka preferences; if there is a path from one option to another (e.g. mushroom -> sausage), then that’s a preference *of the whole system*. If there’s no path from one option to another (e.g. anchovy -> pepperoni), then that means either John or David (or both!) would veto the trade offer *in either direction.* In that case there’s no preferred direction between them; that’s where preferences are incomplete.

Mechanically, for this weird toy model, we’ll imagine that John and David will be offered some number of opportunities (let’s say 3) to trade their current pizza for another, randomly chosen, pizza. If the offered topping is preferred by both of them, then they take the trade. Otherwise, one of them vetos, so they don’t take the trade. Why these particular mechanics? Well, with those mechanics, the preferences can be interpreted in a pretty straightforward way which plays well with other coherence-style arguments - in particular, it’s easy to argue against circular preferences.

(Note that we're not saying trades have the form e.g. "(mushroom -> pepperoni)?" as we would probably usually imagine trades; they instead have the form "(whatever you have now -> pepperoni)". The section Value vs Utility talks about moving our core claim/argument to a more standard notion of trade.)

No Utility Function

One reasonable-seeming way to handle incomplete preferences using a utility function is to just say that two options with “no preference” between them have the same utility - i.e. “no preference” = “indifference”. What goes wrong with that?

Well, in the pizza example, there’s no preference between mushroom and anchovy, so they would have to have the same utility. And there’s no preference between anchovy and pepperoni, so they would have to have the same utility. But that means mushroom and pepperoni have the same utility, which conflicts with the preference for pepperoni over mushroom. So, we can’t represent these preferences via a utility function.

Generalizing: whenever we have state B preferred over state A, and some third state C which has no preference relative to either A or B, we cannot represent the preferences using a utility function. Later, we’ll call that condition “strong incompleteness”, and show that non-strongly incomplete preferences can be represented using a utility function.

Next, let’s see what kind of tricks we can use when preferences are strongly incomplete.

A Contract

Now for the core idea. With these preferences, John and David will turn down trades from mushroom to anchovy (because John vetos it), and turn down trades from anchovy to pepperoni (because David vetos it), even though both prefer pepperoni over mushroom. In principle, both might do better in expectation if John could give up some anchovy for pepperoni, and David could give up some mushroom for anchovy, so that the net shift is from mushroom to pepperoni (a shift which they both prefer).

Before any trade offers come along, John and David sign a contract. John agrees to not veto mushroom -> anchovy trades. In exchange, David agrees to not veto anchovy -> pepperoni trades. Now the two together have completed their preferences: sausage > pepperoni > anchovy > mushroom > cheese.

… but that won’t always work; it depends on the numbers. For instance, what if there were a lot more opportunities to trade anchovy -> pepperoni than mushroom -> anchovy? Then an agreement to not veto anchovy -> pepperoni would be pretty bad for David, and wouldn’t be fully balanced out by the extra mushroom -> anchovy trades. We need some way to make the anchovy -> pepperoni trade happen less often (in expectation), to balance things out. If the two trades happen the same amount (in expectation), then there is no expected change in anchovy, just a shift of probability mass from mushroom to pepperoni. Then David and John both do better.

So how do we make the two trades happen the same amount (in expectation)?

One More Trick: Randomization

Solution: randomize the contract. As soon as the contract is signed, some random numbers will be generated. With some probability, John will agree to never veto mushroom -> anchovy trades. With some other probability, David will agree to never veto anchovy -> pepperoni trades. Then, we choose the two probabilities so that the net expected anchovy states is unchanged by the contract: increasing John’s probability continuously increases expected anchovy, increasing David’s probability continuously decreases expected anchovy, so with the right choice of the two probabilities we can achieve zero expected change in anchovy. Then, the net effect of the contract is to shift some expected mushroom into expected pepperoni; it’s basically a pure win.

That’s the general trick: randomize the preference-completion in such a way that expected anchovy stays the same, while expected mushroom is turned into expected pepperoni.

Claim

Suppose a system has incomplete preferences over a set of states. (For simplicity, we’ll assume the set of states is finite.) Mechanically, this means that there is a “current state” at any given time, and over time the system is offered opportunities to “trade”, i.e. transition to another state; the system accepts a trade A -> B if-and-only-if it prefers B over A.

Claim: the system’s preferences can be made complete (via a potentially-randomized procedure) in such a way that the new distribution of states can be viewed as the old distribution with some probability mass shifted from less-preferred to more-preferred states.

Stronger Subclaim

For the argument, we’ll want to split out the case where a strict improvement can be achieved by completing the preferences.

Suppose there exists three states A, B, C such that:

The system prefers B over A
The system has no preference between either A and C or B and C

We’ll call this case “strongly incomplete preferences”. The pizza example involves strongly incomplete preferences: take A to be mushroom, B to be pepperoni, and C to be anchovy.

Claims:

Strongly incomplete preferences can be randomly completed in such a way that the new distribution of states can be viewed as the old distribution with some strictly positive probability mass shifted from less-preferred to more-preferred states.
Non-strongly incomplete preferences (either complete or “weakly incomplete”) encode a utility maximizer.

In other words: strongly incomplete preferences imply that a strict improvement can be achieved by (possibly randomly) completing the preferences, while non-strongly incomplete preferences imply that the system is a utility maximizer.

In the case of weakly incomplete preferences (i.e. incomplete but not strongly incomplete), we also claim that the preferences can be randomly completed in such a way that the system is indifferent between its original preferences and the (expected) randomly-completed preferences, via a similar trick to the rest of the argument. But that’s not particularly practically relevant, so we won’t talk about it further.

Argument

First Step: Strong Incompleteness

In the case of strong incompleteness, we can directly re-use our argument from the pizza example. We have three states A, B, C such that B is preferred over A, and there is no preference between either A and C or B and C. Then, we randomly add a preference for C over A with probability , and we randomly add a preference for B over C with probability $p_{C \to B}$ .

Frequency of state C increases continuously with $p_{A \to C}$ , decreases continuously with $p_{C \to B}$ , and is equal to its original value when both probabilities are 0. So:

Check whether frequency of state C is higher or lower than original when both probabilities are set to 1.
If higher, then set $p_{C \to B}$ = 1. With $p_{A \to C}$ = 0 the frequency of state C must then be lower than original (since frequency of C is decreasing in $p_{C \to B}$ ), with $p_{A \to C}$ = 1 it’s higher than original by assumption, so by the intermediate value theorem there must be some $p_{A \to C}$ value such that the frequency of C stays the same. Pick that value.
If lower, swap $p_{C \to B}$ with $p_{A \to C}$ and “higher” with “lower” in the previous bullet.

The resulting randomized transformation of the preferences keeps the frequencies of each state the same, except it shifts some probability mass from a less-preferred state (A) to a more-preferred state (B).

(Potential issue with the argument: shifting probability mass from A to B may also shift around probability mass among states downstream of those two states. However, it should generally only shift things in net “good” ways, once we account for the terminal vs instrumental value issue discussed under Value vs Utility. In other words, if we’re using the instrumental value functions, then shifting probability mass from an option valued less by all subagents to one valued more by all subagents should be an expected improvement for all subagents, after accounting for downstream shifts.)

Third Step: Equilibrium Conditions

The second step will argue that non-strongly incomplete preferences encode a utility maximizer. But it’s useful to see how that result will be used before spelling it out, so we’ll do the third step first. To that end, assume that non-strongly incomplete preferences encode a utility maximizer.

Then we have:

If the preferences are strongly incomplete, then there exists some contract/precommitment which “strictly improves” expected outcome states (under the original preferences)
If the preferences are not strongly incomplete, then the system is a utility maximizer.

The last step is to invoke stable equilibrium: strongly incomplete preferences are “unstable” in the sense that the system is incentivized to update from them to more complete preferences, via contract or precommitment. The only preferences which are stable under contracts/precommitments are non-strongly-incomplete preferences, which encode utility maximizers.

Now, we haven't established which distribution of preferences the system will end up sampling from when randomly completing its preferences, in more complex cases where preferences are strongly incomplete in many places at once. But so long as it ends up at some non-dominated choice, it must end up with non-strongly-incomplete preferences with probability 1 (otherwise it could modify the contract for a strict improvement in cases where it ends up with non-strongly-incomplete preferences). And, so long as the space of possibilities is compact and arbitrary contracts are allowed, all we have left is a bargaining problem. The only way the system would end up with dominated preference-distribution is if there's some kind of bargaining breakdown.

Point is: non-dominated strategy implies utility maximization.

Second Step: No Strong Incompleteness

Assume the preferences have no strong incompleteness. We’re going to construct a utility function for them. The strategy will be:

Construct “indifference sets” - i.e. sets of states between which the utility function will be indifferent
Show that there is a complete ordering between the “indifference sets”, so we can order them and assign each a utility based on the ordering

Indifference set construction: put each state in its own set. Then, pick two sets such that there is no preference between any states in either set, and merge the two. Iterate to convergence. At this point, the states are partitioned into sets such that:

there are no preferences between any two states in the same set, and
there is at least one preference between at least one pair of states in any two different sets.

Those will turn out to be our indifference sets.

In order to order the indifference sets, we need to show that:

for any pair of states in two different sets, there is a preference between them
the ordering is consistent - i.e. if one state in set S is preferred to one state in set T, then any state in S is preferred to any state in T.

(Also we need acyclicity, but that follows trivially from acyclicity of the preferences once we have these two properties.)

To show those, first recall that there is at least one preference between at least one pair of states in any two different sets. Visually:

Each dot is a state; the two circles indicate two sets within which there are no preferences. The arrow indicates the one preference which we assume is present between the two sets; all the other cross-set preferences may or may not exist, and could go in either direction, as far as we’ve established so far.

In order to have no strong incompleteness, all these preferences must also be present (though we don’t yet know their direction):

Those preferences must be present because, otherwise, we could establish strong incompleteness like this:

We can also establish the direction of each of the preferences by noting that, by assumption, there is no preference between any two states in the same set:

Left diagram would imply a preference between two states in the upper set, right diagram would imply a preference between two states in the lower set, so neither of these can occur.

So:

Finally, notice that we’ve now established some more preferences between states in the two sets, so we can repeat the argument with another edge to show that even more preferences are present:

… and once we’ve iterated the argument to convergence, we’ll have the key result: if one state in one set is preferred to another state in another set, then any state in the first set is preferred to any state in the second.

And now we can assign a utility function: order the sets, enumerate them in order, then the number of each set is the utility assigned to each state in that set. A state with a higher utility is always preferred over a state with a lower utility, and there is indifference/no preference between two states with the same utility.

(Note that this is a little different from standard definitions - for mathematical convenience, people typically define utility maximizers to take trades in both directions when indifferent, whereas here our utility maximizer might take trades in both directions between an indifferent pair, or take trades in neither direction between an indifferent pair. For practical purposes, the distinction does not matter; just assume that the agent maintains some small “bid/ask spread”, so a nonzero incentive is needed to induce trade, and the two models become equivalent.)

High-Level Potential Problems With This Argument (And Potential Solutions)

Value vs Utility

Suppose that, in the pizza example, instead of offers to trade a new pizza for whatever pizza David and John currently have, there are offers to trade a specific type of pizza for another specific type - e.g. a mushroom <-> anchovy trade, rather than a mushroom <-> (whatever we have) trade.

In that setup, we might sometimes want to trade “down” to a less-preferred option, in hopes of trading it for a more-preferred option in the future. For instance, if there are lots of vegetarians around offering to trade their sausage pizza for mushroom, then David and John would have high instrumental value for mushroom pizza (because we can probably trade it for sausage), even though neither of us terminally values mushroom. Instrumental and terminal value diverge.

Then, the right way to make the argument would be to calculate the (instrumental) value functions of each subagent (in the dynamic programming sense of the phrase), and use that in place of the (terminal) utilities of each subagent. The argument should then mostly carry over, but there will be one major change: the value function is potentially time-dependent. It’s not “mushroom pizza” which has a value assigned to it, but rather “mushroom pizza at time t”. That, in turn, gets into issues of updating and myopia.

Inconsistent Myopia

A myopic agent in this context would be one which just always trades to more (terminally) preferred options, and never to less (terminally) preferred options, without e.g. strategically trading for a less-preferred mushroom pizza in hopes of later trading the mushroom for more-preferred sausage.

As currently written, our setup implicitly assumed that kind of myopia, which means the subagents are implicitly not thinking about the future when making their choices. … which makes it really weird that the subagents would make contracts/precommitments/self-modications entirely for the sake of future performance. They’re implicitly inconsistently myopic: myopic during trading, but nonmyopic beforehand when choosing to contract/precommit/self-modify.

That said, that kind of inconsistent myopia does make sense for plenty of realistic situations. For instance, maybe the preferences will be myopic during trading, but a designer optimizes those preferences beforehand. Or instead of a designer, maybe evolution/SGD optimizes the preferences.

Alternatively, if the argument is modified to use instrumental values rather than terminal utilities (as the previous section suggested), then the inconsistent myopia issue would be resolved; subagents would simply be non-myopic.

Updating

Once we use instrumental values rather than terminal utilities on states, it’s possible that those values will change over time. They could change purely due to time - for instance, if David and John are hoping to trade a mushroom pizza for sausage, then as the time left to trade winds down, we’ll become increasingly desperate to get rid of that mushroom pizza; its instrumental value falls.

Instrumental value could also change due to information. For instance, if David and John learn that there aren’t as many vegetarians as we expected looking to trade away sausage for mushroom, then that also updates our instrumental value for mushroom pizza.

In order for the argument to work in such situations, the contract/precommitment/self-modification will probably also need to allow for updating over time - e.g. commit to a policy rather than a fixed set of preferences.

Different Beliefs

The argument implicitly assumes that David and John have the same beliefs about what distribution of trade offers we’ll see. If we have different beliefs, then there might not be completion-probabilities which we both find attractive.

On the other hand, if our beliefs differ, then that opens up a whole different set of possible contract-types - e.g. bets and insurance. So there may be some way to use bets/insurance to make the argument work again.

Implications For…

AI Alignment: So much for that idea…

Either we can’t leverage incomplete preferences for safety properties (e.g. shutdownability), or we need to somehow circumvent the above argument.

Economists: If there’s no representative agent, then why ain’t you rich yet?

In economic jargon, completion of the preferences means there exists a representative agent - i.e. the system’s preferences can be summarized by a single utility maximizer. These days most economists assert that there is no representative agent in most real-world markets, so: if there’s no representative agent, then why aren’t you rich yet? And if there is, then what’s its utility function?

Insofar as we view the original incomplete preferences in this model as stemming from multiple subagents with veto power (as in the pizza example), there’s an expected positive sum gain from the contract which completes the preferences. Which means that some third party could, in principle, get paid a share of those gains in exchange for arranging the contract. In practice, most of the work would probably be in designing the contracts in such a way that the benefits are obvious to laypeople, and marketing them. Classic financial engineering business.

So this is the very best sort of economic theorem, where either a useful model holds in the real world, or there’s money to be made.

Conclusions

Main claim, stated two ways:

A group of utility-maximizing subagents have an incentive to form contracts under which they converge to a single utility maximizer
A system with incomplete preferences has an incentive to precommit/self-modify in such a way that the preferences are completed

In general, they do this using randomization over preference-completions. The only expected change each contract/precommitment/self-modification induces is a shift of probability mass from some states to same-or-more-preferred states for all of the subagents; thus each contract is positive-sum.

There is lots more work to be done here, as outlined in the potential problems section. The argument should probably be reframed in terms of value functions (over time) rather than static utility functions in order to more clearly handle instrumentally, though not terminally, valuable actions. The commitments that the subagents make may be better cast as policies rather than fixed preferences. Also, the subagents may have different beliefs about the future, which the argument in this post did not handle.

If the argument holds then this is bad news for alignment hopes that leverage robust incomplete preferences / non E[Utility] maximizers, and also raises some questions about the empirical consensus that modern real-world markets are not expected utility maximizers.

Utility FunctionsEconomicsProbabilistic ReasoningAIWorld ModelingRationality