I'm going to formalize some ideas related to my previous post about pursuing convergent instrumental goals without good priors and prove theorems about how much power a coalition can guarantee. The upshot is that, while non-majority coalitions can't guarantee controlling a non-negligible fraction of the expected power, majority coalitions can guarantee controlling a large fraction of the expected power.


In a unit-sum game:

  • there is some unknown environmental variable . (In my previous posts, this would be the color of the sun)
  • each of players submits an action (we could consider different action sets for each player but it doesn't matter)
  • player gets shares, where satisfies:
    1. Non-negativity:
    2. Unit sum:

A unit-sum game is symmetric if, for any permutation , we have .

A coalition in a unit-sum game is a set of players. If is a coalition, then a policy for that coalition is a distribution assigning an action to each player in that coalition. We will assume that there are coalitions such that each player appears in exactly one coalition.

We will consider the expected amount of shares a coalition will get, based on the coalitions' policies. Specifically, define where specifies the actions for the players in the coalition . In general, my goal in proving theorems will be to guarantee a high value for a coalition regardless of .

A coalition containing a majority (>50%) of the players can, in some cases, gain an arbitrarily high fraction of the shares:

Theorem 1: For any and , there exists a symmetric unit-sum game with players in which any coalition controlling a majority of the players can get at least expected shares.

Proof: Fix , . Let be such that . Define the set of actions . Define to split the shares evenly among players who give the action that the majority players chose (with ties being resolved towards lower actions). The variable is unused. Clearly, this unit-sum game is symmetric.

Let be a majority coalition. Consider the following policy for the coalition: select an action uniformly at random and have everyone take that action. Clearly, the action this coalition chooses will always be the majority action.

By symmetry among the different actions, any player outside the coalition has a chance of choosing the majority action. Upon choosing the majority action, a player outside the coalition gets at most shares. Since there are at most players outside the majority coalition, in expectation they get at most shares in total. So the majority coalition itself gets at least shares in expectation.

As a result of theorem 1, we won't be able to design good general strategies for non-majority coalitions. Instead we will focus on good general strategies for majority coalitions.

Theorem 2: In a symmetric unit-sum game, if a coalition has at least a fraction of the players (for integer ), then given the policies for the other coalitions , coalition has a policy resulting in getting at least expected shares regardless of , i.e. .

Proof: Without loss of generality, assume there are only 2 coalitions, , and the other coalition has index 2. To define the majority's policy , divide the coalition into sub-coalitions of players each, plus leftover players (who take some arbitrary action). Each sub-coalition will independently select actions for its members according to the distribution . Note that each sub-coalition is "equivalent" to , so by symmetry of the unit-sum game, each sub-coalition and gets the same expected number of shares (regardless of ). So the coalition gets at most a expected fraction of the shares. Conversely, gets at least a expected fraction of the shares.


One issue with the formalism is that it seems easier for a small coalition to spy on a large one than for a large coalition to spy on a small one, which makes it implausible that a large coalition can have a shared source of randomness not available to small coalitions.

However, note that the policy defined in Theorem 2 does not rely on the majority coalition having more coordination than the opposing coalition. This is because the policy factors into independent subcoalitions whose sizes are , so shared sources of randomness are only needed within subcoalitions of size (and this shared randomness is equivalent to the shared randomness within itself).


Theorem 2 is good news if we expect a large majority of powerful AI systems to be aligned with human values. It means that (under some assumptions) these AI systems can achieve a large expected fraction of the universe without having good priors about the random variable .

To do this, it is necessary to know something about what the other coalitions' strategies are, such that these strategies can be copied. A major problem with this is that, in the real world, the action one should take to gain resources depends on relative facts (e.g. one's location), whereas the actions are not context-dependent in this way. Therefore, the actions should be interpreted as "ways of turning one's context into a resource-gathering strategy". It is not obvious how to interpret another agent's policy as a "way of turning their context into a resource-gathering strategy" such that it can be copied, and this seems like a useful topic for further thought.

New Comment
3 comments, sorted by Click to highlight new comments since: Today at 12:26 PM

Interesting. But theorem 2 may say less than it seems. If you subtract from every player, you get a zero-sum game, and then theorem 2 seems to reduce to saying that a majority coalition can always expect to not lose in a symmetric zero-sum game.

I agree that Theorem 2 only says that the majority coalition expects to get a fraction of the universe proportional to its size, and does not say they get more. This fact is unsurprising.

Actually, I'm wrong, it is possible for a majority coalition to take a loss in a zero-sum game: http://lesswrong.com/r/discussion/lw/oj4/a_majority_coalition_can_lose_a_symmetric_zerosum/

A consequence of that is that your theorem 2 is sharp. You can't guarantee more than what you stated. In particular, there exists games with coalitions arbitrarily close to that can't get more than of the value.