Cooperative Oracles: Nonexploited Bargaining — AI Alignment Forum

x

Cooperative Oracles: Nonexploited Bargaining — AI Alignment Forum

In this post, we formalize and generalize the phenomenon described in the Eliezer Yudkowsky post Cooperating with agents with different ideas of fairness, while resisting exploitation. (Part of the series started here.)

Consider a game between $n$ players, $P_{1}, \dots, P_{n}$ . We denote $A_{i}$ as the finite set of actions available to player $P_{i}$ , and $Δ_{i}$ as the set of distributions on $A_{i}$ . Let $R$ denote $Δ_{1} \times \dots \times Δ_{n}$ , the set of possible outcomes of this game, and given an $r \in R$ , let $r_{i} \in Δ_{i}$ be the $i$ th component.

Each player $P_{i}$ will be equipped with a "fair set" $F_{i} \subseteq R$ of outcomes in which they feel that they are not being exploited, as well as a utility function, a continuous function $U_{i} : R \to [0, 1]$ . Note that the use of the word fair here is a little weird. We will think of agents as identifying an outcome as fair if they do not feel that they have been exploited. They may find it perfectly fair to exploit others.

$F_{i}$ cannot be an arbitrary set. There must be a set valued function $f_{i} : R \to P (Δ_{i})$ with closed graph and $f_{i} (r)$ nonempty and convex for all $f_{i} : R \to P (Δ_{i})$ , such that $F_{i}$ is exactly the set of $r$ such that $r_{i} \in f_{i} (r)$ . i.e. $F_{i}$ should be the set of points for which a given Kakutani map fixes the $Δ_{i}$ coordinate.

This condition may seem arbitrary, but it is both exactly what we need for our bargaining procedure to work, and exactly what we would expect to get out of agents that have access to everyone's actions with something like a reflective oracle. This condition in part implies that for every collection of distributions on actions that all the other players take, there must be some distribution on actions that you can take in which you consider yourself nonexploited. The restriction is slightly stronger that that, requiring a type of continuity in the way that your nonexploited responses are a function of the other players' actions.

Our Bargaining procedure is now simple. We take the set $F$ of fair points to be the intersection over all $i$ of $F_{i}$ , and we choose an $r \in F$ which is Pareto optimal in $F$ (i.e. there is no other $r^{'} \in F$ with $U_{i} (r^{'}) \geq U_{i} (r)$ for all $i$ and $U_{i} (r^{'}) > U_{i} (r)$ for some $i$ ).

There may be more than one Pareto Optimum. If so, we choose one arbitrarily.

This procedure is not obviously well defined. In particular, it is not obvious that $F$ is nonempty, or that given that it is nonempty that it contains Pareto optimum.

Claim: $F$ is nonempty and contains a Pareto optimum (over $F$ ).

Proof: First, we show that $F$ is compact and nonempty. Since each $f_{i}$ is Kakutani, the product of them, $r \mapsto f_{1} (r), \dots, f_{n} (r)$ is also Kakutani, and thus must have a nonempty compact set of fixed points. This set is exactly the intersection of the $N_{i}$ .

Now, since $F$ is compact, $U_{1}$ must achieve a maximum value on some nonempty compact set. Let $S_{1}$ be the subset of $F$ on which $U_{1}$ is maximized. Define $S_{i}$ similarly as the subset of $U_{1}$ on which $U_{i}$ is maximized. $U_{1}$ is nonempty, and every point in $S_{i}$ must be Pareto optimal in $F$ . $□$

Now, lets go through an couple examples.

Example 1: Two players $P_{1}$ and $P_{2}$ are in a prisoner's dilemma $Δ_{1} = Δ_{2} = [0, 1]$ meaning the probability that the player cooperates. The utility functions are $U_{1} (p, q) = 2 q - p$ and $U_{2} (p, q) = 2 p - q$ .

Both players consider themselves to be exploited if the other player gets more utility than they do. Thus we have $F_{1} = {(p, q) \in R | p \leq q}$ and $F_{2} = {(p, q) \in R | p \geq q}$ . We need to verify that these fair sets satisfy the condition of being the fixed points of a Kakutani map. Indeed, consider $f_{1} (p, q) = min (p, q)$ . $f_{1}$ is continuous, and thus Kakutani. and the set of $(p, q)$ for which $f_{1} (p, q) = p$ is exactly $F_{2}$ . Similarly we can take $f_{2} (p, q) = min (p, q)$ .

Since $F_{1}$ and $F_{2}$ were both valid fair sets, they must have nonempty intersection. In this case, their intersection is the set of $(p, q)$ with $p = q$ , in which both player's cooperate with the same probability. This set has a unique Pareto optimum, $(1, 1)$ in which both players cooperate with certainty.

Example 2: Two players are again in a prisoner's dilemma with the same utilities as before. However this time each player thinks that they should cooperate only 90% of the time while the other player cooperates all the time. Thus each player thinks they should get 1.1 utility, while the other only gets 0.8. They refuse to let the other player take advantage of them and get more than 0.8 utility. They also want to make sure that the unique best utility for the other player is to agree to the 1.1, 0.8 split. To do this, they say that for every unit of utility that they get less than 1.1, the opponent should get 0.1 less utility than 0.8.

In utility space, a pair of utilities is acceptable to $P_{1}$ are if $0.8 - U_{2} \geq .1 (1.1 - U_{1})$ . Translating this to action space, we say that $(p, q) \in F_{1}$ if $p \leq \frac{40 q + 23}{70}$ , and similarly $(q, p) \in F_{2}$ if $q \leq \frac{40 p + 23}{70}$ . Again, you can see these are valid sets by considering $f_{1} (p, q) = min (p, \frac{40 q + 23}{70})$ and $f_{2} (p, q) = min (q, \frac{40 p + 23}{70})$ . (Note that if we changed the numbers, these function might not have been valid, since they might not have mapped the unit interval into itself.)

The intersection of $F_{1}$ and $F_{2}$ is a kite shape, with a unique Pareto Optimum at $p = q = 23 / 30$

Example 3: Two players are again in a prisoner's dilemma with the same utilities as before. Player 1 has the same fair set as in Example 1 ( $F_{1} = {(p, q) \in R | p \leq q}$ ). Player 2 however is cooperate bot and thinks that a solution is fair if he cooperates ( $F_{2} = {(p, q) \in R | q = 1}$ , $f_{2} (p, q) = 1$ ).

Now, $F_{2}$ is a subset of $F_{1}$ , so the set of mutually fair points $f_{2} (p, q) = 1$ is just the set where $P_{2}$ cooperates. Every point in this set is a Pareto optimum, so the outcome is based on how we choose the Pareto optimum, which is under specified.