In this post, we formalize and generalize the phenomenon described in the Eliezer Yudkowsky post Cooperating with agents with different ideas of fairness, while resisting exploitation. (Part of the series started here.)

Consider a game between players, . We denote as the finite set of actions available to player , and as the set of distributions on . Let denote , the set of possible outcomes of this game, and given an , let be the th component.

Each player will be equipped with a "fair set" of outcomes in which they feel that they are not being exploited, as well as a utility function, a continuous function . Note that the use of the word fair here is a little weird. We will think of agents as identifying an outcome as fair if they do not feel that they have been exploited. They may find it perfectly fair to exploit others.

cannot be an arbitrary set. There must be a set valued function with closed graph and nonempty and convex for all , such that is exactly the set of such that . i.e. should be the set of points for which a given Kakutani map fixes the coordinate.

This condition may seem arbitrary, but it is both exactly what we need for our bargaining procedure to work, and exactly what we would expect to get out of agents that have access to everyone's actions with something like a reflective oracle. This condition in part implies that for every collection of distributions on actions that all the other players take, there must be some distribution on actions that you can take in which you consider yourself nonexploited. The restriction is slightly stronger that that, requiring a type of continuity in the way that your nonexploited responses are a function of the other players' actions.

Our Bargaining procedure is now simple. We take the set of fair points to be the intersection over all of , and we choose an which is Pareto optimal in (i.e. there is no other with for all and for some ).

There may be more than one Pareto Optimum. If so, we choose one arbitrarily.

This procedure is not obviously well defined. In particular, it is not obvious that is nonempty, or that given that it is nonempty that it contains Pareto optimum.

Claim: is nonempty and contains a Pareto optimum (over ).

Proof: First, we show that is compact and nonempty. Since each is Kakutani, the product of them, is also Kakutani, and thus must have a nonempty compact set of fixed points. This set is exactly the intersection of the .

Now, since is compact, must achieve a maximum value on some nonempty compact set. Let be the subset of on which is maximized. Define similarly as the subset of on which is maximized. is nonempty, and every point in must be Pareto optimal in .

Now, lets go through an couple examples.

Example 1: Two players and are in a prisoner's dilemma meaning the probability that the player cooperates. The utility functions are and .

Both players consider themselves to be exploited if the other player gets more utility than they do. Thus we have and . We need to verify that these fair sets satisfy the condition of being the fixed points of a Kakutani map. Indeed, consider . is continuous, and thus Kakutani. and the set of for which is exactly . Similarly we can take .

Since and were both valid fair sets, they must have nonempty intersection. In this case, their intersection is the set of with , in which both player's cooperate with the same probability. This set has a unique Pareto optimum, in which both players cooperate with certainty.

Example 2: Two players are again in a prisoner's dilemma with the same utilities as before. However this time each player thinks that they should cooperate only 90% of the time while the other player cooperates all the time. Thus each player thinks they should get 1.1 utility, while the other only gets 0.8. They refuse to let the other player take advantage of them and get more than 0.8 utility. They also want to make sure that the unique best utility for the other player is to agree to the 1.1, 0.8 split. To do this, they say that for every unit of utility that they get less than 1.1, the opponent should get 0.1 less utility than 0.8.

In utility space, a pair of utilities is acceptable to are if . Translating this to action space, we say that if , and similarly if . Again, you can see these are valid sets by considering and . (Note that if we changed the numbers, these function might not have been valid, since they might not have mapped the unit interval into itself.)

The intersection of and is a kite shape, with a unique Pareto Optimum at

Example 3: Two players are again in a prisoner's dilemma with the same utilities as before. Player 1 has the same fair set as in Example 1 (). Player 2 however is cooperate bot and thinks that a solution is fair if he cooperates (, ).

Now, is a subset of , so the set of mutually fair points is just the set where cooperates. Every point in this set is a Pareto optimum, so the outcome is based on how we choose the Pareto optimum, which is under specified.

Oracle AI
Personal Blog
New Comment
6 comments, sorted by Click to highlight new comments since:

Does this generalise to giving gradations of fairness, rather than boolean fair/unfair set memberships?

I don't see how it would. The closest thing I can think of is letting agents choose randomly between different fair sets, but I don't see what that would buy for you.

Every point in this set is a Pareto optimum, so the outcome is based on how we choose the Pareto optimum, which is under specified.

Recurse, using the agent's concepts of fairness, using the new Pareto set as the set of possibilities? And continue until the set stabilises? And then the agents will be truly indifferent between the outcomes.

The problem is that our original set was a product of the actions available to the players, so they were able to cut things off using their own actions. When you restrict to the Pareto frontier, this is no longer the case.

This seems a proper version of what I was trying to do here: https://agentfoundations.org/item?id=513

Yeah. The original generator of these ideas was that we were trying to find (or prove impossible) an improvement on NicerBot: An agent with reflective oracles that cooperates with itself (regardless of what reflective oracle is chosen), but is never exploited in expectation (even by epsilon.)