Threat-Resistant Bargaining Megapost: Introducing the ROSE Value
Post Status: Original research, mathy, high-context. Make sure you've read the last three posts in the sequence first, this is going to be a very rough one to start on.
As a recap of the lastthreeposts, bargaining has an elegant notion of what a neutral point is: the Nash bargaining solution. And, games where the two players can pay each other money (transferrable-utility games) have an elegant notion of what a neutral point is: the CoCo value. And, as it turns out, games in general have an elegant notion of what a neutral point is that subsumes the previous two concepts as special cases, which can be motivated by either trying to generalize the Nash bargaining solution to games without a clear disagreement point (as John Harsanyi did), or by trying to generalize the CoCo value to games without transferrable utility (as I did).
Sadly, there's an issue with all of these where a player can get a lot more utility by inventing increasingly effective ways of destroying the utility of everyone else. Basically, someone can go "I have a torturizer, favor me or I'll use it on you", and the threat will work. In fact, such reasoning is the default for the CoCo value. That issue is detailed in the second and third posts in the sequence.
And so, there's a question of "is there a distinguished point in transferrable-utility games which is generated by a more threat-resistant line of reasoning?" As per the third post in the sequence, using a correlated equilibrium as the fallback point can serve that role, but they're not unique.
There is! (kinda)
I have a lot of uncertainty whether the equation I wrote down in this post is The Threat-Resistant Bargaining Solution, but if it isn't, it's at least pretty close to The Solution. My thinking about it is still in flux; here's a rundown of the current situation, which also parallels the structure of the rest of the post.
Following the example set by CoCo values in the first two posts, there's a highly general pathway to amplify a solution concept for 2-player transferrable-utility games into a solution concept for n-player transferrable-utility games, and from there into a notion of equilibrium for games in general. Thus, in the quest for a more threat-resistant solution concept, we can restrict our attention to the special case of 2-player transferrable-utility games. These are vastly simpler because the players can settle matters by paying each other, and their utility functions can be calibrated against a common scale, namely, how much they'd pay for an outcome.
To aid in the search for a "good" solution for transferrable-utility games, the first step is listing out a whole bunch of desired properties to check. Also, in that section, the "n-player transferrable-utility case → full generality" step is discussed. There's a theorem that pretty much says "X equilibria exist, where X=your preferred solution concept", but it was spectacularly hard to prove.
In the toy case of 2-player games where Bob can act freely and Alice only has the ability to pay Bob, I found The Solution, as bolstered by a slick visual proof that if you take a few obvious desiderata, one of which is immunity to threats, it's the only possible solution.
There's an obvious way to generalize from the previous toy case to the more general setting of 2-player transferrable-utility games, but I don't have a comparable proof that it's the only possible threat-resistant solution and I would be extremely interested in someone proving that. It still fulfills all the nice properties I was looking for. And if you squint hard enough at what the process is doing, it looks sorta like it's generated from Yudkowsky's "block the opponent from getting more than what you think is fair" procedure to negotiate with agents that have different ideas of fairness. Also, it introduces the key concept of an "initiative game".
By using Shapley values, there's an obvious way to generalize to n-player transferrable-utility games in general. The result is threat-resistant and fulfills a bunch of nice properties, but has the baffling and fatal drawback of not being guaranteed to give all players ≥ their maximin value, as demonstrated by a tidy little toy problem (this only arises for n≥3)
And, right before this post was slated to go out, I came up with an alternate n-player generalization of the 2-player case that's threat-resistant and gets pretty much all the nice properties at once, including giving players ≥ their maximin value!
Looking at what the solution does in the special case of bargaining problems, it induces a novel scale-and-shift-invariant bargaining solution that (kinda) doesn't depend on where the disagreement point is! And there's an easy heuristic for applying it in real life.
And finally there's some thought experiments that suggest that this isn't threat-proof enough, and also I don't have an argument that these sorts of solutions flow out of more primitive desiderata about agent behavior. So if someone wants to take this further, there's definitely remaining room to do so.
It's a little tricky to present, however, because whoo boy this is going to be a long post. Grab a snack and a whiteboard, you'll need it.
General Preliminaries:
Amplifying Solutions
One of the insights gained from looking at CoCo values, and how they generated Harsanyi equilibria (see posts 1 and 2 in the sequence), is that there's a general pathway to take an equilibrium notion for 2-player games to an equilibrium notion for n-player games with transferrable utility, and from there, to generate an equilibrium notion for n-player games in full generality.
Two to N with Shapley Values
A game consists of a set of players N, the actions of the players {Ai}i∈N, and utility functions for the players, {Ui}i∈N, of type ∏i∈NAi→R. Well, to be fully technical, the action spaces must be compact metric spaces and the utility functions must be continuous, but usually you can ignore stuff like that.
Transferrable-utility games are those where we're interpreting all the numbers as money. The payoffs are how much the various players value different outcomes, as denominated in money. And also the players are allowed to pay each other, if they want. These side payments don't count as an action in the game.
Now, let's say you've got a preferred way of taking a two-player transferrable-utility game and figuring out how much value the two players "should" walk away with (which might take a side payment to actually implement). Given any two-player game as input, you conclude that the special point they "should" be aiming for is the perfect, the elegant, [INSERT-NAME-HERE] value!
Sooo cool. Aliens would independently reinvent it, you're sure.
Let's denote "value assigned to Alice in game G" as vG(alice). In general, you've got some function v which maps 2-player games to their value for a player. What's the big deal with that?
Well, hypothetically, if two big coalitions S and ¯¯¯¯S were fighting each other in a grand n-player game, you could view it as just a two-player game! The coalitions are the players, their move spaces are the spaces ∏i∈SAi and ∏j∉SAj, (everything the teams can do if they coordinate) and their utility functions are the team utilities ∑i∈SUi and ∑j∉SUj Then you could use your function v to work out the value of a coalition, vG(S).
The reason this is important is because, the natural notion of how to split a pile of value in an n-player game is the Shapley values. There isn't a comparable competitor. And in order to define Shapley values in the first place, you need some notion of "the value of a coalition, if they work together". So, your two-player solution notion can be amplified into an n-player solution notion by using it to figure out what payoff coalitions should get, and using that to compute Shapley values. In an n-player game, player i will get a value of
ϕG(i)=∑i∈S⊆N(n−s)!(s−1)!n!(vG(S)−vG(¯¯¯¯S))
This isn't the standard presentation of Shapley values, but the standard presentation can be reshuffled into this equation. ϕG(i) will henceforth be used as an abbreviation for the Shapley value of player i in game G. ¯¯¯¯S is just N/S, the coalition of everyone else. n and s are the cardinalities of the sets N and S respectively.
A way of intuitively stating what the Shapley value does, is that a coalition is formed in random order until it includes everyone, and everyone demands their marginal contribution to the value of the forming coalition. Average over all possible random orders, and that gives the Shapley values for everyone.
In particular, the CoCo value follows this procedure. It's most naturally defined for two-player games, and the generalization of it to n-player games follows this pattern of "use the two-player case to define the value of a coalition, then give everyone their Shapley values"
N to Full Generality With Importance Weights
This is for games with transferrable utility. What about games in general where money might not be available? Well, points on the Pareto frontier giving everyone's utilities, (x1,x2,x3...), are associated with a tuple of importance weights (a1,a2,a3...), where x1,x2,x3... is the tuple of payoffs from the utility functions that maximizes ∑iaiUi. Ie, every Pareto-optimal point can be post-hoc rationalized as the result that maximizes a weighted mix of utilities, for some way of weighting everyone (the importance weights). Given a spot (x1,x2,x3...) is there an easy way of reading off the "importance weights" that make that spot the best one? Yup! The "importance weights" are just the normal vector to the Pareto frontier at the given spot.
An interesting aspect is that the weights ai can also be interpreted as currency conversion factors, like curnits/utilon for player i. (curnit=currency unit). As an example, if Alice has aalice=5 and Bob has abob=1, then this is saying that an Alice-utilon is worth 5 Bob-utilons. Or that spending 5 currency units for an Alice utilon is as good a deal as spending the 5 currency units on 5 Bob utilons, for someone trying to spend money to maximize the weighted sum of utilities 5Ualice+Ubob.
So, given a Pareto-frontier point, you can use the importance weights at that point to make up an imaginary currency. Then you redenominate everyone's utilities in that, ask what the Shapley values are for everyone, convert from money back to utilons, and get... well, usually you'll get a point that's past the Pareto-frontier. It's too good for everyone and cannot be attained by any combination of actions. But there will be some points where, when you do this, you get back where you started. They correspond to ways of importance-weighting everyone that make "maximize the weighted sum of utilities" (utilitarianism??) and "give everyone their Shapley payoffs" (fairness??) line up.
These will be called "general equilibria" when we're not talking about a specific notion of how to assess the value of a coalition, and for more specific notions of how to assess the value of a game, they're named correspondingly, like "CoCo equilibria"
Two to Full Generality: Transporting Desiderata
If the function mapping a 2-player game and a player to their value is nicely behaved, it will bestow nice properties on the "everyone gets their Shapley payoffs" function, which bestows nice properties on the corresponding general equilibria. As an example, let's take shift-invariance of the v function. If you start the game with 100 extra dollars and your foe starts with 40 extra dollars, you should get your old value plus 100 dollars, and your foe should get their old value plus 40 dollars. Shift-invariance of this form will imply that the "everyone gets Shapley payoffs" result is also shift-invariant. And this implies that the corresponding general equilibria will be scale-and-shift invariant, and not depend on how everyone reports their utility functions.
So, assuming "everyone gets their Shapley payoffs" is convincing to the reader, that leaves ONE point of intervention to get something which is not the CoCo value. Find a different way to define the value of a coalition. That's the only option available. And things get even more constrained if you accept that the value of a coalition should be definable as the value of the associated 2-player game between them and everyone else.
The Big Desiderata List
For the following 10 desiderata (not all of which are attainable, especially because a strict subset of them, if fulfilled, imply that the CoCo value is the only possibility), they come in several variants.
As an example, let's take "player i should get better than their maximin payoff". It could apply to the vG(i) games for 2-player games. It could apply to the vG(S) games for coalitions containing i, that if i is on a team, the team won't have player i lose too badly. It could apply to the Shapley values for 2-player games, or for general n-player games. Or it could apply to all the general equilibria.
Several of these will imply each other (like, doing better than maximin for the n-player Shapley values implies doing better than maximin in all general equilibria), but since they don't all imply each other, it's important to make fine distinctions between, for example, action monotonicity for the vG(i) games, and action monotonicity for the resulting Shapley values. They'll be stated informally at first,but there are formal versions of all of them.
Desideratum 1: 2-Player Reduction The value of a coalition S in a general game is found by treating the two coalitions S and N/S as just two big players, and evaluating the value of the corresponding player s in the corresponding two-player game.
This desideratum, if fulfilled, means that we only need to define values for 2-player games, considerably simplifying things.
Desideratum 2a-b: Shift-Invariance for v (or ϕ). In a 2-player game, if Alice has a constant amount c added to all their payoffs, then the value of the new game for Alice is the value of the old game plus c. Adding a constant amount to Bob's payoffs has no effect on Alice's payoffs. Similar for n-player games and the Shapley payoffs.
Desideratum 2c: Scale-and-Shift Invariance for General Equilibria If x∈Rn is a general equilibrium for the game G, then for all ways of scale-and-shifting everyone's utility functions, the scale-and-shift of x is a general equilibrium for the scale-and-shift of the game G.
These desiderata, if fulfilled, means that the solution to a game doesn't depend on irrelevant details of how everyone reports their utilities, or if someone has an extra 100 bucks in their drawer at home.
Desideratum 3a.a-c: Weak/Strong/Really Strong Pareto Optimality for v For all 2-player games, the tuple of payoffs for the two players is on the weak Pareto frontier/strong Pareto frontier/maximizes the sum of utilities. If you haven't seen it before, the weak Pareto frontier is "there's no way to make everyone strictly better off", and the strong Pareto frontier" is "there's no way to make some group strictly better off and everyone else the same".
Desideratum 3b: Pareto Optimality for ϕ. For all n-player games, the sum of the Shapley values for the players equals the maximum sum of utilities.
Desideratum 3c.a-b: Weak/Strong Pareto Optimality for General Equilibria For all n-player games, every general equilibrium is on the weak Pareto Frontier/strong Pareto frontier.
Pareto-optimality is an extremely obvious desiderata to have. A nonobvious aspect, though, is that 3a, Pareto optimality for v, is almost completely disconnected from the rest of them. It's entirely possible to have the Shapley values attain Pareto optimality, without the values for 2-player games being Pareto-optimal, even in a weak sense. The "active ingredient" in making the Shapley values Pareto optimal seems to be that the value of the "everyone" coalition should be the maximum attainable value, and other than that, it doesn't really matter what the value function v is doing.
Desideratum 4a-b: Continuity for v (or ϕ) For a 2-player game, we can consider the family of games with the same action spaces. Within this family, we can abbreviate a game as just its tuple of utility functions, of type A1×A2→R2 (continuous functions from actions to results). The value function, of type (A1×A2→R2)→R2 should be continuous when the function space is equipped with the topology of uniform convergence, for all families of games. For Shapley values, it should be the same, just use the Shapley value function of type (∏i∈NAi→Rn)→Rn.
Desideratum 4ca: Existence of General Equilibria The general equilibrium function, of type (∏i∈NAi→Rn)→P(Rn) mapping a game to its set of general equilibria, should never return the empty set, regardless of the family of games.
Desideratum 4cb: Closed-Graph for General Equilibria The general equilibrium function, of type (∏i∈NAi→Rn)→P(Rn) mapping a game to its set of general equilibria, should have closed graph for all families of games.
Existence is obviously important, the early continuity desiderata less so. It may seem obvious that microscopic variations in everyone's utilities shouldn't have dramatic effects on what happens, but these desiderata actually pack a serious punch in constraining solutions. As an example, U2(argmaxaU1(a)) (the utility you get when the foe argmaxes) doesn't vary continuously as U1 varies, so all terms like this are forbidden from showing up in equations.
Desideratum 5a-c: Redundancy Invariance for v (or ϕ) (or General Equilibria) Giving players additional actions that are equivalent to just probabilistic mixtures of existing actions should have no effect on the values given by v, or the Shapley values, or the set of general equilibria.
This is another one that seems obvious but has unexpected power. I actually had a few variants of my bargaining solution, and this killed one of them. Admittedly, it naively seems like this desiderata fails when you have the computational power to read the foe's move, but this desiderata can be preserved even in that case if you set up the problem correctly.
Desideratum 6a-c: Threat-Resistance for v (or ϕ) (or General Equilibria) If all players get a kit of "destroy the utility of this other player" buttons of varying intensity that can be pressed with no cost and paired with any move, then regardless of how much utility the buttons can destroy, the values/Shapley values/set of general equilibria remain unchanged.
And this is a special one. It effectively says that if your foe has a "send you to hell" button that's costless for them to press, then the button would have NO effect on negotiations, lest you incentivize your foe to create such a button because you'd respond. It gets even more powerful when paired with continuity, because then it levels up to say something like "if the foe has a button that sends me to hell and they slightly value pressing that button, I'll given them just a little bit of extra value to have them put the button away". It's not quite as good as it seems, though, which is why it's named Threat-Resistance instead of Threat-Immunity. More about that near the end of this post. Really, if fulfilled, this just confers immunity to a specific class of threats. There are threat-like shenanigans that this desiderata doesn't rule out.
The following three desiderata (7,8,9) are constraints on what payoffs the various players can expect.
Desideratum 7a-c: Payoff Dominance for v (or ϕ) (or General Equilibria) If Alice always gets higher utility than Bob, regardless of what event happens, then the values/Shapley values/general equilibria will also say that Alice gets higher utility than Bob. Same for always getting lower utility.
Desideratum 8a-c: Maximin Dominance for v (or ϕ) (or General Equilibria) All the values/Shapley values/general equilibria will say that each player gets ≥ their maximin payoff.
Desideratum 9a-c: Sub-Max for v (or ϕ) (or General Equilibria) All the values/Shapley values/general equilibria will say that each player gets ≤ their maximum payoff.
Of these three desiderata, Maximin Dominance is the most likely, and Sub-Max is the most contentious. It's reasonable to state that a player who can help others massively, but who can't gain much utility themselves, should get compensated greatly for their trouble. The same considerations also make Payoff Dominance look unlikely in the general n-player case.
Desideratum 10: Action Monotonicity for v (or ϕ) (or General Equilibria). If a player gains access to more actions, their value/Shapley value should improve. For general equilibria, the worst-case equilibrium payoff for the player that gained access to more actions should improve.
This desiderata is the key piece that mandates the CoCo value, and so, it must fail.
General Equilibria Existence and Properties
Some of the implications are, if not trivial, extremely easy to prove. Others have simple proofs, but the simple proofs depend on details of exactly what the proposed bargaining solution is. However, there's one result that towers far over the rest in difficulty, the General Equilibrium Existence Theorem.
As we're inventing various notions of Shapley-like games, we might want a guarantee that they all have associated equilibria, regardless of which notion we're looking at. CoCo/Harsanyi equilibria were generated by going "take a point x on the Pareto frontier, read off the importance vector a from the point, use it to rescale everyone's utilities, compute the CoCo value in that game, convert from money back to utilons, and if you made x again by that process, it's a CoCo equilibria". They always exist, regardless of the game. But, what if we invent some new value notion besides the CoCo value? Wouldn't it be nice to prove that all games have equilibria induced by that new notion? And what if we change our opinion about what sort of value to use? Wouldn't it be nice to not have to redo the equilibria existence proof? Really, you'd want some sort of equilibria existence proof that works under highly general assumptions. And so, that's what this does.
Now, the full theorem statement is actually very complicated, so what's stated here is a simplified statement of the theorem that doesn't miss anything intuitive.
Theorem 1: General Equilibria Existence Theorem:A game G has a general equilibrium assuming the following five conditions on the value function v, and one condition on how the value function interacts with games.
Property 1: For all games G, if a new game G′ is defined where every player can voluntarily decrease their own utility down to some specified minimum value, then the values for all coalitions don't change. Put another way, if everyone can shoot themselves in the foot, it has no effect on negotiations because they just won't.
Property 2: For all games G, coalitions S, and players i, we have that vG(S∪{i})−vG(S)≥minm∈∏i∈NAiUi(m). A player joining a coalition can always contribute at least their worst-case utility. This is an extremely weak form of "everyone should get more than their minimax value" that's more like "everyone should get more than their minimum value"
Property 3: For all games G, if a new game G′ is defined where everyone can pair their normal action with a second action that does nothing at all, the value of a coalition in the new game is the same as the value of a coalition in the old game. This is a weak form of Redundancy Invariance that's like "duplicate copies of actions don't matter"
Property 4: For all games G, the sum of the Shapley values for that game maximizes the sum of utilities. This is just Pareto-optimality.
Property 5: The value function v is continuous.
Property 6: The game G is augmented to a game G∗ϵ, where everyone gets two extra moves they can pair with their ordinary move. One is "shooting themselves in the foot", ie, voluntarily decreasing their utility to whatever level they specify unless it's too low. The second move is "specify payoffs for everyone", ie, everyone can pick a vector in Bn (the n-dimensional L2 ball, the hypersphere in n dimensions), all the vector coordinates are multiplied by ϵ, and those tiny payoffs or penalties are added to everyone's utilities. For all vectors a∈ΔN (importance weights on all the players), sufficiently low ϵ, and players i, if a player has an importance weight of exactly 0 in the game G∗ϵ, then their Shapley value will be positive.
Note: The rest of this section will be me drilling down into math details, and is skippable.
Begin the Infodump!
Intuitively, what's going on with that odd final condition is that, the "everyone can specify a tiny payoff or penalty for everyone else" modification makes it so that the set of all the possible utility tuples doesn't have creases or corners, instead it'll be smooth, and every point on the Pareto frontier induces exactly one normal vector/vector of importance weights a. This "smoothing" tames some pathologies, and so you just need to find general equilibria for the "smoothed" games, and take the limit as ϵ→0, to get equilibria for the original game. That's why we care about such games.
Separately, there's division-by-zero errors that may show up when a player has an importance weight of literally 0. This is because you have to divide by the importance weight to convert the payoff of a player from currency back to utilons. Now, if the Shapley value is positive for all players that have 0 importance, this isn't really an issue, because f(0)>0 means that limδ→0f(δ)δ=∞. To assess the behavior around where some players have an importance weight of 0, you can just go "as importance goes to 0, payoffs for those players blows up to infinity no matter how we take the limit, and so we can rule out a neighborhood around all the "someone has 0 importance" options as not containing equilibria". But if someone could have 0 Shapley value if they had 0 importance, then the function may do weird things, or depend on how you take the limit, and you're no longer justified in ignoring the region around where that player has 0 importance.
Now, why expect this assumption of "if 0 importance, positive Shapley value" to be true in practice for the specified class of games (an original game G, but where everyone can also decrease their own utility and distribute small amounts of utility to everyone else)? Well, the Shapley value for player i in G∗ϵ can be written as a sum of terms like vG∗ϵ(S∪{i})−vG∗ϵ(S), and so even if a player has literally 0 importance and their utility doesn't matter, they could shift the beneficiaries of their "distribute a tiny amount of utility" action from "everyone else" to "team S", benefiting the coalition they're joining by a nonzero amount. And they can do this for any coalition, and attain nonzero Shapley value that way. But actually proving this requires you to look into the details of how the value function v is defined instead of making general arguments that apply to any value function.
Inspired by this argument from two paragraphs up, we can prove
Proposition 1:If a game G fulfills the property that ∀a∈ΔN,i∈N:ai=0→ϕG,a(i)>0, then all of its general equilibria will be on the strict Pareto frontier. ϕG,a is the Shapley values of the game G but everyone's utility functions are rescaled from Ui toaiUi.
Effectively, this is saying that the "everyone can benefit others a little bit" criterion of nonzero Shapley values at 0 importance is a sufficient criterion for all equilibria to be strictly Pareto optimal, and everyone will matter a little bit. The argument for why is that "payoff blowup" argument a few paragraphs back.
Proposition 2:If the value function v is continuous, then the function mapping a game G to its set of general equilibria has closed graph.
This is just a general result.
I would like to take a moment to say that proving that General Equilibria Existence Theorem was ABSOLUTELY HORRIBLE. "Just use a fixpoint theorem, bro". I tried, I really did. The issue is, the function you're trying to find a fixpoint of doesn't produce convex sets so you can't apply any of the usual fixpoint theorems right out of the box. However, convexity isn't preserved under homeomorphisms, which is kinda suspicious. Eventually it turned out from Horvath 1991, that it's possible to define a sort of "generalized convexity" that says something kinda like "A space X is generalized-convex if you can come up with a notion of generalized-convex-hull that takes a finite set of points and produces a contractible set, and subsets Y are generalized-convex if you can take any set of finitely many points from Y and the generalized-convex-hull is still a subset of Y". The paper had some fixpoint theorems that carried over to this setting, and after a whooole lot of work, I was able to coax them to give me the result I wanted.
Last-minute edit: If you need to salvage one of these results in a more general case, you can have analogous conditions apply to the "all n players get payoffs" function directly instead of the 2-player value function v.
ROSE Value, Step-by-Step
ROSE Value, Toy Case
ROSE is an acronym which will be defined later because it's a spoiler at this early stage.
Let's inspect a simple special case here, where Alice has only one possible action, and Bob has many, and Alice and Bob can pay each other money. If we cannot get a sensible equilibrium notion here in such a restricted toy case, we can't get one for games in general. Observe the picture.
What does the CoCo value say about this? Well, it'd pick
Ie, Bob goes "a default point will be set, for if we can't come to an agreement. We'll evenly split surplus over the default point. So I'll pick this point right here in the lower-left to maximize my take. Ie, threaten Alice to get money".
Drawing the Desiderata
If we're aiming for something nonthreatening, then we should probably look for a different mathematically distinguished point in this picture. The first issue to address is what the various desiderata say about this class of toy problems. 2-player reduction isn't relevant here, because it's already a 2-player game.
Shift Invariance (that adding constants to a player's utility means that the solution point should have the same constant added to it) carries the implication that our choice of distinguished point shouldn't depend on the background grid lines of how much money the players get, it should be selected in a background-invariant way. This seems one of the highly reliable axioms to adopt, so let's delete the background grid.
Pareto Optimality, another essential axiom to have, says that our choice of distinguished point should be on this diagonal line which corresponds to the outcomes where Bob maximizes the sum of utilities instead, and one of the players pays the other.
Redundancy Invariance (that adding actions that can equally well be accomplished with probabilistic mixtures of existing actions does nothing) is a touch less essential but still worth adopting. This implies that our choice of distinguished point should just depend on the convex set, not which points in it are pure actions vs mixtures of other actions.
Maximin Dominance (that both players do better than their maximin payoff) is another essential axiom to have, and it says that our choice of distinguished point should be on this portion of the frontier. Any further down and Bob goes "wait, I can just argmax!" and does that, and any further to left, Alice goes "wait, I can just refuse to pay Bob!".
And that leaves five more axioms remaining, slightly more dubious. Sub-Max and Payoff Dominance, along with Continuity, Threat Resistance, and Action Monotonicity.
Sub-Max (both players get ≤ the maximum utility they could achieve alone without paying each other) mandates the single point where Alice goes "alright Bob, since you could just argmax, how about you maximize surplus instead, and I'll pay you the exact amount you'd need to be ok with doing that instead of argmaxing". But this is an incredibly strong condition and there's no obvious reason why we should desire it.
Payoff Dominance (that if Alice outscores Bob for all possible outcomes, then the net solution should also feature Alice outscoring Bob, and vice-versa) is an odd one when you try to translate what it's saying into geometrical terms. It effectively says that the solution must be in the following range.
Now, that geometrical meaning isn't obvious at all, so here's where it came from. Payoff Dominance as stated, is an oddball because adding and subtracting constants (by Shift-Invariance) shouldn't affect anything, and yet doing that can change whether Alice consistently outscores Bob, or whether the opposite happens. Actually, that's a feature, not a bug. The proper way to wield this axiom is to use Shift Invariance first to set the 0,0 point on one of these diagonal lines, and then invoke Payoff Dominance to go "Alice consistently outscores Bob in all outcomes (or vice-versa), so she/he should get a higher score". And that's where this geometric way of viewing it came from. When restated in geometric terms, the true role of Payoff Dominance seems to be that it's smuggling in the "ooh, diagonal lines are nice" conclusion. I mean, I agree with that conclusion, but it seems in poor form.
And then we get to the really interesting three.
Action Monotonicity, specialized to this particular case, says that if the convex shape is bigger, Bob gets more utility. More about this one later.
Continuity is a fascinating one because it rules out a solution which is otherwise extremely attractive. It's incredibly suspicious that, for the CoCo value, the threat point if negotiations break down is Bob maximizing BobUtility-AliceUtility. In general, if you don't want your foe making threats, you should disregard every threat point where your foe isn't maximizing their own utility. And so, obviously, the TRUE threat point is the one where Bob just maximizes his own utility, and we split surplus 50/50 from there.... right?? But, observe the following violation of continuity.
And so, that brings us to the most special desiderata of all, Threat Resistance. It says that if you close the convex shape to the left and downwards (by adding in the "incinerate utility" buttons), it has no effect on the selected point. Only the Pareto frontier of Bob's actions affect what happens.
Continuity and Threat Resistance, when combined, are really interesting, because they actually make it quite difficult to even specify many of the points on the Pareto frontier in the first place. The obvious solution of "assume Bob will maximize his own utility, and then offer a 50/50 split from there" violates continuity, and most variations on the CoCo solution (where you vary the slope of the tangent line) are certainly continuous, but violate Threat Resistance.
Really, there's only two solutions I could come up with. One of them violated Action Monotonicity for Bob (which is a plausible desideratum since Alice only has one action), and that leaves only one option that remains.
Toy-Case Visual Proof of the ROSE Value
The outcome where Alice goes "ok, Bob. I know you can get 41 dollars worth of value from picking your favorite move. May I interest you in the option where you play that Pareto-Frontier move to maximize the sum of our utilities instead? I commit to paying you if you do so, and paying you enough so that you walk away with 41 dollars and 1 cent worth of value." And then Bob takes it, so they hit the green dot in the below image. Bob is responsible for picking a spot in the green region that's on the Pareto-frontier/purple line, and Alice is responsible for paying him afterwards, moving diagonally up and back to the green spot, which is slightly better for Bob than Bob just directly argmaxing.
That pair of payoffs is the ROSE value of the game.
But is this really the only possible solution? Do we have a theorem that says it's the only solution that fulfills certain axioms? Yes. Does it cheat by using the axiom of ≤ maximum value? No it does not. In fact, I can actually give a slick visual proof of it!
But what axioms are needed? Only five. Maximin Dominance, Threat-Resistance, Redundancy-Invariance, Pareto-Optimality and Action Monotonicity (for Bob). And technically you only need the weak form of Redundancy-Invariance that says that adding duplicate actions does nothing. If you believe the following:
1: Players should get ≥ the amount of utility they can guarantee on their own. After all, if a notion of bargaining says a player should lose, and they can force a win, why should they assent to that bargaining notion?
2: If Alice and Bob are both indifferent between action A and B, then deleting one of them shouldn't affect anything because Bob can just do the other one.
3: If Alice and Bob are doing a suboptimal pair of actions, they should be able to coordinate on doing something which is Not That.
4: Bob is happy to come up with extra actions/unhappy to lose existing actions, since Alice can only act by paying him.
5: If Bob brings a gun to negotiations, and he doesn't intrinsically value shooting Alice, Alice shouldn't react to the gun, lest she incentivize Bob to bring a gun to negotiations.
Then the only possible solution is Alice paying Bob just enough to motivate him to play the Pareto-Frontier/surplus-maximizing move instead of what he'd do otherwise.
Let's get started with that visual proof! Let's say that Bob's only moves were as follows. Bob can press a button that gives both players the ROSE value, or he take any action that gives Alice equal or greater utility than the ROSE value. Those are the only options. The only solution to this game that's maximin-compliant is Bob pressing the ROSE button.
Bob brings a gun which can be used to shoot himself in the foot or shoot Alice. Er, to be more precise, Bob's action space augments from B to B×[−d,0]2, where d is a sufficiently large number. This is how much utility to destroy from Bob and Alice, respectively. By Threat-Resistance, payoffs don't change from what they were, namely, the utilities associated with Bob pressing the ROSE button.
Bob gains access to his missing kit of actions (not paired with the "destroy utility" buttons this time). All of which could have been implemented by just pressing the ROSE button and decreasing utility appropriately, so by Redundancy-Invariance, payoffs don't change from what they were, namely, the utilities associated with Bob pressing the ROSE button.
Post Status: Original research, mathy, high-context. Make sure you've read the last three posts in the sequence first, this is going to be a very rough one to start on.
As a recap of the last three posts, bargaining has an elegant notion of what a neutral point is: the Nash bargaining solution. And, games where the two players can pay each other money (transferrable-utility games) have an elegant notion of what a neutral point is: the CoCo value. And, as it turns out, games in general have an elegant notion of what a neutral point is that subsumes the previous two concepts as special cases, which can be motivated by either trying to generalize the Nash bargaining solution to games without a clear disagreement point (as John Harsanyi did), or by trying to generalize the CoCo value to games without transferrable utility (as I did).
Sadly, there's an issue with all of these where a player can get a lot more utility by inventing increasingly effective ways of destroying the utility of everyone else. Basically, someone can go "I have a torturizer, favor me or I'll use it on you", and the threat will work. In fact, such reasoning is the default for the CoCo value. That issue is detailed in the second and third posts in the sequence.
And so, there's a question of "is there a distinguished point in transferrable-utility games which is generated by a more threat-resistant line of reasoning?" As per the third post in the sequence, using a correlated equilibrium as the fallback point can serve that role, but they're not unique.
There is! (kinda)
I have a lot of uncertainty whether the equation I wrote down in this post is The Threat-Resistant Bargaining Solution, but if it isn't, it's at least pretty close to The Solution. My thinking about it is still in flux; here's a rundown of the current situation, which also parallels the structure of the rest of the post.
Following the example set by CoCo values in the first two posts, there's a highly general pathway to amplify a solution concept for 2-player transferrable-utility games into a solution concept for n-player transferrable-utility games, and from there into a notion of equilibrium for games in general. Thus, in the quest for a more threat-resistant solution concept, we can restrict our attention to the special case of 2-player transferrable-utility games. These are vastly simpler because the players can settle matters by paying each other, and their utility functions can be calibrated against a common scale, namely, how much they'd pay for an outcome.
To aid in the search for a "good" solution for transferrable-utility games, the first step is listing out a whole bunch of desired properties to check. Also, in that section, the "n-player transferrable-utility case → full generality" step is discussed. There's a theorem that pretty much says "X equilibria exist, where X=your preferred solution concept", but it was spectacularly hard to prove.
In the toy case of 2-player games where Bob can act freely and Alice only has the ability to pay Bob, I found The Solution, as bolstered by a slick visual proof that if you take a few obvious desiderata, one of which is immunity to threats, it's the only possible solution.
There's an obvious way to generalize from the previous toy case to the more general setting of 2-player transferrable-utility games, but I don't have a comparable proof that it's the only possible threat-resistant solution and I would be extremely interested in someone proving that. It still fulfills all the nice properties I was looking for. And if you squint hard enough at what the process is doing, it looks sorta like it's generated from Yudkowsky's "block the opponent from getting more than what you think is fair" procedure to negotiate with agents that have different ideas of fairness. Also, it introduces the key concept of an "initiative game".
By using Shapley values, there's an obvious way to generalize to n-player transferrable-utility games in general. The result is threat-resistant and fulfills a bunch of nice properties, but has the baffling and fatal drawback of not being guaranteed to give all players ≥ their maximin value, as demonstrated by a tidy little toy problem (this only arises for n≥3)
And, right before this post was slated to go out, I came up with an alternate n-player generalization of the 2-player case that's threat-resistant and gets pretty much all the nice properties at once, including giving players ≥ their maximin value!
Looking at what the solution does in the special case of bargaining problems, it induces a novel scale-and-shift-invariant bargaining solution that (kinda) doesn't depend on where the disagreement point is! And there's an easy heuristic for applying it in real life.
And finally there's some thought experiments that suggest that this isn't threat-proof enough, and also I don't have an argument that these sorts of solutions flow out of more primitive desiderata about agent behavior. So if someone wants to take this further, there's definitely remaining room to do so.
It's a little tricky to present, however, because whoo boy this is going to be a long post. Grab a snack and a whiteboard, you'll need it.
General Preliminaries:
Amplifying Solutions
One of the insights gained from looking at CoCo values, and how they generated Harsanyi equilibria (see posts 1 and 2 in the sequence), is that there's a general pathway to take an equilibrium notion for 2-player games to an equilibrium notion for n-player games with transferrable utility, and from there, to generate an equilibrium notion for n-player games in full generality.
Two to N with Shapley Values
A game consists of a set of players N, the actions of the players {Ai}i∈N, and utility functions for the players, {Ui}i∈N, of type ∏i∈NAi→R. Well, to be fully technical, the action spaces must be compact metric spaces and the utility functions must be continuous, but usually you can ignore stuff like that.
Transferrable-utility games are those where we're interpreting all the numbers as money. The payoffs are how much the various players value different outcomes, as denominated in money. And also the players are allowed to pay each other, if they want. These side payments don't count as an action in the game.
Now, let's say you've got a preferred way of taking a two-player transferrable-utility game and figuring out how much value the two players "should" walk away with (which might take a side payment to actually implement). Given any two-player game as input, you conclude that the special point they "should" be aiming for is the perfect, the elegant, [INSERT-NAME-HERE] value!
Sooo cool. Aliens would independently reinvent it, you're sure.
Let's denote "value assigned to Alice in game G" as vG(alice). In general, you've got some function v which maps 2-player games to their value for a player. What's the big deal with that?
Well, hypothetically, if two big coalitions S and ¯¯¯¯S were fighting each other in a grand n-player game, you could view it as just a two-player game! The coalitions are the players, their move spaces are the spaces ∏i∈SAi and ∏j∉SAj, (everything the teams can do if they coordinate) and their utility functions are the team utilities ∑i∈SUi and ∑j∉SUj Then you could use your function v to work out the value of a coalition, vG(S).
The reason this is important is because, the natural notion of how to split a pile of value in an n-player game is the Shapley values. There isn't a comparable competitor. And in order to define Shapley values in the first place, you need some notion of "the value of a coalition, if they work together". So, your two-player solution notion can be amplified into an n-player solution notion by using it to figure out what payoff coalitions should get, and using that to compute Shapley values. In an n-player game, player i will get a value of
ϕG(i)=∑i∈S⊆N(n−s)!(s−1)!n!(vG(S)−vG(¯¯¯¯S))
This isn't the standard presentation of Shapley values, but the standard presentation can be reshuffled into this equation. ϕG(i) will henceforth be used as an abbreviation for the Shapley value of player i in game G. ¯¯¯¯S is just N/S, the coalition of everyone else. n and s are the cardinalities of the sets N and S respectively.
A way of intuitively stating what the Shapley value does, is that a coalition is formed in random order until it includes everyone, and everyone demands their marginal contribution to the value of the forming coalition. Average over all possible random orders, and that gives the Shapley values for everyone.
In particular, the CoCo value follows this procedure. It's most naturally defined for two-player games, and the generalization of it to n-player games follows this pattern of "use the two-player case to define the value of a coalition, then give everyone their Shapley values"
N to Full Generality With Importance Weights
This is for games with transferrable utility. What about games in general where money might not be available? Well, points on the Pareto frontier giving everyone's utilities, (x1,x2,x3...), are associated with a tuple of importance weights (a1,a2,a3...), where x1,x2,x3... is the tuple of payoffs from the utility functions that maximizes ∑iaiUi. Ie, every Pareto-optimal point can be post-hoc rationalized as the result that maximizes a weighted mix of utilities, for some way of weighting everyone (the importance weights). Given a spot (x1,x2,x3...) is there an easy way of reading off the "importance weights" that make that spot the best one? Yup! The "importance weights" are just the normal vector to the Pareto frontier at the given spot.
An interesting aspect is that the weights ai can also be interpreted as currency conversion factors, like curnits/utilon for player i. (curnit=currency unit). As an example, if Alice has aalice=5 and Bob has abob=1, then this is saying that an Alice-utilon is worth 5 Bob-utilons. Or that spending 5 currency units for an Alice utilon is as good a deal as spending the 5 currency units on 5 Bob utilons, for someone trying to spend money to maximize the weighted sum of utilities 5Ualice+Ubob.
So, given a Pareto-frontier point, you can use the importance weights at that point to make up an imaginary currency. Then you redenominate everyone's utilities in that, ask what the Shapley values are for everyone, convert from money back to utilons, and get... well, usually you'll get a point that's past the Pareto-frontier. It's too good for everyone and cannot be attained by any combination of actions. But there will be some points where, when you do this, you get back where you started. They correspond to ways of importance-weighting everyone that make "maximize the weighted sum of utilities" (utilitarianism??) and "give everyone their Shapley payoffs" (fairness??) line up.
These will be called "general equilibria" when we're not talking about a specific notion of how to assess the value of a coalition, and for more specific notions of how to assess the value of a game, they're named correspondingly, like "CoCo equilibria"
Two to Full Generality: Transporting Desiderata
If the function mapping a 2-player game and a player to their value is nicely behaved, it will bestow nice properties on the "everyone gets their Shapley payoffs" function, which bestows nice properties on the corresponding general equilibria. As an example, let's take shift-invariance of the v function. If you start the game with 100 extra dollars and your foe starts with 40 extra dollars, you should get your old value plus 100 dollars, and your foe should get their old value plus 40 dollars. Shift-invariance of this form will imply that the "everyone gets Shapley payoffs" result is also shift-invariant. And this implies that the corresponding general equilibria will be scale-and-shift invariant, and not depend on how everyone reports their utility functions.
So, assuming "everyone gets their Shapley payoffs" is convincing to the reader, that leaves ONE point of intervention to get something which is not the CoCo value. Find a different way to define the value of a coalition. That's the only option available. And things get even more constrained if you accept that the value of a coalition should be definable as the value of the associated 2-player game between them and everyone else.
The Big Desiderata List
For the following 10 desiderata (not all of which are attainable, especially because a strict subset of them, if fulfilled, imply that the CoCo value is the only possibility), they come in several variants.
As an example, let's take "player i should get better than their maximin payoff". It could apply to the vG(i) games for 2-player games. It could apply to the vG(S) games for coalitions containing i, that if i is on a team, the team won't have player i lose too badly. It could apply to the Shapley values for 2-player games, or for general n-player games. Or it could apply to all the general equilibria.
Several of these will imply each other (like, doing better than maximin for the n-player Shapley values implies doing better than maximin in all general equilibria), but since they don't all imply each other, it's important to make fine distinctions between, for example, action monotonicity for the vG(i) games, and action monotonicity for the resulting Shapley values. They'll be stated informally at first,but there are formal versions of all of them.
Desideratum 1: 2-Player Reduction The value of a coalition S in a general game is found by treating the two coalitions S and N/S as just two big players, and evaluating the value of the corresponding player s in the corresponding two-player game.
This desideratum, if fulfilled, means that we only need to define values for 2-player games, considerably simplifying things.
Desideratum 2a-b: Shift-Invariance for v (or ϕ). In a 2-player game, if Alice has a constant amount c added to all their payoffs, then the value of the new game for Alice is the value of the old game plus c. Adding a constant amount to Bob's payoffs has no effect on Alice's payoffs. Similar for n-player games and the Shapley payoffs.
Desideratum 2c: Scale-and-Shift Invariance for General Equilibria If x∈Rn is a general equilibrium for the game G, then for all ways of scale-and-shifting everyone's utility functions, the scale-and-shift of x is a general equilibrium for the scale-and-shift of the game G.
These desiderata, if fulfilled, means that the solution to a game doesn't depend on irrelevant details of how everyone reports their utilities, or if someone has an extra 100 bucks in their drawer at home.
Desideratum 3a.a-c: Weak/Strong/Really Strong Pareto Optimality for v For all 2-player games, the tuple of payoffs for the two players is on the weak Pareto frontier/strong Pareto frontier/maximizes the sum of utilities. If you haven't seen it before, the weak Pareto frontier is "there's no way to make everyone strictly better off", and the strong Pareto frontier" is "there's no way to make some group strictly better off and everyone else the same".
Desideratum 3b: Pareto Optimality for ϕ. For all n-player games, the sum of the Shapley values for the players equals the maximum sum of utilities.
Desideratum 3c.a-b: Weak/Strong Pareto Optimality for General Equilibria For all n-player games, every general equilibrium is on the weak Pareto Frontier/strong Pareto frontier.
Pareto-optimality is an extremely obvious desiderata to have. A nonobvious aspect, though, is that 3a, Pareto optimality for v, is almost completely disconnected from the rest of them. It's entirely possible to have the Shapley values attain Pareto optimality, without the values for 2-player games being Pareto-optimal, even in a weak sense. The "active ingredient" in making the Shapley values Pareto optimal seems to be that the value of the "everyone" coalition should be the maximum attainable value, and other than that, it doesn't really matter what the value function v is doing.
Desideratum 4a-b: Continuity for v (or ϕ) For a 2-player game, we can consider the family of games with the same action spaces. Within this family, we can abbreviate a game as just its tuple of utility functions, of type A1×A2→R2 (continuous functions from actions to results). The value function, of type (A1×A2→R2)→R2 should be continuous when the function space is equipped with the topology of uniform convergence, for all families of games. For Shapley values, it should be the same, just use the Shapley value function of type (∏i∈NAi→Rn)→Rn.
Desideratum 4ca: Existence of General Equilibria The general equilibrium function, of type (∏i∈NAi→Rn)→P(Rn) mapping a game to its set of general equilibria, should never return the empty set, regardless of the family of games.
Desideratum 4cb: Closed-Graph for General Equilibria The general equilibrium function, of type (∏i∈NAi→Rn)→P(Rn) mapping a game to its set of general equilibria, should have closed graph for all families of games.
Existence is obviously important, the early continuity desiderata less so. It may seem obvious that microscopic variations in everyone's utilities shouldn't have dramatic effects on what happens, but these desiderata actually pack a serious punch in constraining solutions. As an example, U2(argmaxaU1(a)) (the utility you get when the foe argmaxes) doesn't vary continuously as U1 varies, so all terms like this are forbidden from showing up in equations.
Desideratum 5a-c: Redundancy Invariance for v (or ϕ) (or General Equilibria) Giving players additional actions that are equivalent to just probabilistic mixtures of existing actions should have no effect on the values given by v, or the Shapley values, or the set of general equilibria.
This is another one that seems obvious but has unexpected power. I actually had a few variants of my bargaining solution, and this killed one of them. Admittedly, it naively seems like this desiderata fails when you have the computational power to read the foe's move, but this desiderata can be preserved even in that case if you set up the problem correctly.
Desideratum 6a-c: Threat-Resistance for v (or ϕ) (or General Equilibria) If all players get a kit of "destroy the utility of this other player" buttons of varying intensity that can be pressed with no cost and paired with any move, then regardless of how much utility the buttons can destroy, the values/Shapley values/set of general equilibria remain unchanged.
And this is a special one. It effectively says that if your foe has a "send you to hell" button that's costless for them to press, then the button would have NO effect on negotiations, lest you incentivize your foe to create such a button because you'd respond. It gets even more powerful when paired with continuity, because then it levels up to say something like "if the foe has a button that sends me to hell and they slightly value pressing that button, I'll given them just a little bit of extra value to have them put the button away". It's not quite as good as it seems, though, which is why it's named Threat-Resistance instead of Threat-Immunity. More about that near the end of this post. Really, if fulfilled, this just confers immunity to a specific class of threats. There are threat-like shenanigans that this desiderata doesn't rule out.
The following three desiderata (7,8,9) are constraints on what payoffs the various players can expect.
Desideratum 7a-c: Payoff Dominance for v (or ϕ) (or General Equilibria) If Alice always gets higher utility than Bob, regardless of what event happens, then the values/Shapley values/general equilibria will also say that Alice gets higher utility than Bob. Same for always getting lower utility.
Desideratum 8a-c: Maximin Dominance for v (or ϕ) (or General Equilibria) All the values/Shapley values/general equilibria will say that each player gets ≥ their maximin payoff.
Desideratum 9a-c: Sub-Max for v (or ϕ) (or General Equilibria) All the values/Shapley values/general equilibria will say that each player gets ≤ their maximum payoff.
Of these three desiderata, Maximin Dominance is the most likely, and Sub-Max is the most contentious. It's reasonable to state that a player who can help others massively, but who can't gain much utility themselves, should get compensated greatly for their trouble. The same considerations also make Payoff Dominance look unlikely in the general n-player case.
Desideratum 10: Action Monotonicity for v (or ϕ) (or General Equilibria). If a player gains access to more actions, their value/Shapley value should improve. For general equilibria, the worst-case equilibrium payoff for the player that gained access to more actions should improve.
This desiderata is the key piece that mandates the CoCo value, and so, it must fail.
General Equilibria Existence and Properties
Some of the implications are, if not trivial, extremely easy to prove. Others have simple proofs, but the simple proofs depend on details of exactly what the proposed bargaining solution is. However, there's one result that towers far over the rest in difficulty, the General Equilibrium Existence Theorem.
As we're inventing various notions of Shapley-like games, we might want a guarantee that they all have associated equilibria, regardless of which notion we're looking at. CoCo/Harsanyi equilibria were generated by going "take a point x on the Pareto frontier, read off the importance vector a from the point, use it to rescale everyone's utilities, compute the CoCo value in that game, convert from money back to utilons, and if you made x again by that process, it's a CoCo equilibria". They always exist, regardless of the game. But, what if we invent some new value notion besides the CoCo value? Wouldn't it be nice to prove that all games have equilibria induced by that new notion? And what if we change our opinion about what sort of value to use? Wouldn't it be nice to not have to redo the equilibria existence proof? Really, you'd want some sort of equilibria existence proof that works under highly general assumptions. And so, that's what this does.
Now, the full theorem statement is actually very complicated, so what's stated here is a simplified statement of the theorem that doesn't miss anything intuitive.
Theorem 1: General Equilibria Existence Theorem: A game G has a general equilibrium assuming the following five conditions on the value function v, and one condition on how the value function interacts with games.
Property 1: For all games G, if a new game G′ is defined where every player can voluntarily decrease their own utility down to some specified minimum value, then the values for all coalitions don't change. Put another way, if everyone can shoot themselves in the foot, it has no effect on negotiations because they just won't.
Property 2: For all games G, coalitions S, and players i, we have that vG(S∪{i})−vG(S)≥minm∈∏i∈NAiUi(m). A player joining a coalition can always contribute at least their worst-case utility. This is an extremely weak form of "everyone should get more than their minimax value" that's more like "everyone should get more than their minimum value"
Property 3: For all games G, if a new game G′ is defined where everyone can pair their normal action with a second action that does nothing at all, the value of a coalition in the new game is the same as the value of a coalition in the old game. This is a weak form of Redundancy Invariance that's like "duplicate copies of actions don't matter"
Property 4: For all games G, the sum of the Shapley values for that game maximizes the sum of utilities. This is just Pareto-optimality.
Property 5: The value function v is continuous.
Property 6: The game G is augmented to a game G∗ϵ, where everyone gets two extra moves they can pair with their ordinary move. One is "shooting themselves in the foot", ie, voluntarily decreasing their utility to whatever level they specify unless it's too low. The second move is "specify payoffs for everyone", ie, everyone can pick a vector in Bn (the n-dimensional L2 ball, the hypersphere in n dimensions), all the vector coordinates are multiplied by ϵ, and those tiny payoffs or penalties are added to everyone's utilities. For all vectors a∈ΔN (importance weights on all the players), sufficiently low ϵ, and players i, if a player has an importance weight of exactly 0 in the game G∗ϵ, then their Shapley value will be positive.
Note: The rest of this section will be me drilling down into math details, and is skippable.
Begin the Infodump!
Intuitively, what's going on with that odd final condition is that, the "everyone can specify a tiny payoff or penalty for everyone else" modification makes it so that the set of all the possible utility tuples doesn't have creases or corners, instead it'll be smooth, and every point on the Pareto frontier induces exactly one normal vector/vector of importance weights a. This "smoothing" tames some pathologies, and so you just need to find general equilibria for the "smoothed" games, and take the limit as ϵ→0, to get equilibria for the original game. That's why we care about such games.
Separately, there's division-by-zero errors that may show up when a player has an importance weight of literally 0. This is because you have to divide by the importance weight to convert the payoff of a player from currency back to utilons. Now, if the Shapley value is positive for all players that have 0 importance, this isn't really an issue, because f(0)>0 means that limδ→0f(δ)δ=∞. To assess the behavior around where some players have an importance weight of 0, you can just go "as importance goes to 0, payoffs for those players blows up to infinity no matter how we take the limit, and so we can rule out a neighborhood around all the "someone has 0 importance" options as not containing equilibria". But if someone could have 0 Shapley value if they had 0 importance, then the function may do weird things, or depend on how you take the limit, and you're no longer justified in ignoring the region around where that player has 0 importance.
Now, why expect this assumption of "if 0 importance, positive Shapley value" to be true in practice for the specified class of games (an original game G, but where everyone can also decrease their own utility and distribute small amounts of utility to everyone else)? Well, the Shapley value for player i in G∗ϵ can be written as a sum of terms like vG∗ϵ(S∪{i})−vG∗ϵ(S), and so even if a player has literally 0 importance and their utility doesn't matter, they could shift the beneficiaries of their "distribute a tiny amount of utility" action from "everyone else" to "team S", benefiting the coalition they're joining by a nonzero amount. And they can do this for any coalition, and attain nonzero Shapley value that way. But actually proving this requires you to look into the details of how the value function v is defined instead of making general arguments that apply to any value function.
Inspired by this argument from two paragraphs up, we can prove
Proposition 1: If a game G fulfills the property that ∀a∈ΔN,i∈N:ai=0→ϕG,a(i)>0, then all of its general equilibria will be on the strict Pareto frontier. ϕG,a is the Shapley values of the game G but everyone's utility functions are rescaled from Ui to aiUi.
Effectively, this is saying that the "everyone can benefit others a little bit" criterion of nonzero Shapley values at 0 importance is a sufficient criterion for all equilibria to be strictly Pareto optimal, and everyone will matter a little bit. The argument for why is that "payoff blowup" argument a few paragraphs back.
Proposition 2: If the value function v is continuous, then the function mapping a game G to its set of general equilibria has closed graph.
This is just a general result.
I would like to take a moment to say that proving that General Equilibria Existence Theorem was ABSOLUTELY HORRIBLE. "Just use a fixpoint theorem, bro". I tried, I really did. The issue is, the function you're trying to find a fixpoint of doesn't produce convex sets so you can't apply any of the usual fixpoint theorems right out of the box. However, convexity isn't preserved under homeomorphisms, which is kinda suspicious. Eventually it turned out from Horvath 1991, that it's possible to define a sort of "generalized convexity" that says something kinda like "A space X is generalized-convex if you can come up with a notion of generalized-convex-hull that takes a finite set of points and produces a contractible set, and subsets Y are generalized-convex if you can take any set of finitely many points from Y and the generalized-convex-hull is still a subset of Y". The paper had some fixpoint theorems that carried over to this setting, and after a whooole lot of work, I was able to coax them to give me the result I wanted.
Last-minute edit: If you need to salvage one of these results in a more general case, you can have analogous conditions apply to the "all n players get payoffs" function directly instead of the 2-player value function v.
ROSE Value, Step-by-Step
ROSE Value, Toy Case
ROSE is an acronym which will be defined later because it's a spoiler at this early stage.
Let's inspect a simple special case here, where Alice has only one possible action, and Bob has many, and Alice and Bob can pay each other money. If we cannot get a sensible equilibrium notion here in such a restricted toy case, we can't get one for games in general. Observe the picture.
What does the CoCo value say about this? Well, it'd pick
Ie, Bob goes "a default point will be set, for if we can't come to an agreement. We'll evenly split surplus over the default point. So I'll pick this point right here in the lower-left to maximize my take. Ie, threaten Alice to get money".
Drawing the Desiderata
If we're aiming for something nonthreatening, then we should probably look for a different mathematically distinguished point in this picture. The first issue to address is what the various desiderata say about this class of toy problems. 2-player reduction isn't relevant here, because it's already a 2-player game.
Shift Invariance (that adding constants to a player's utility means that the solution point should have the same constant added to it) carries the implication that our choice of distinguished point shouldn't depend on the background grid lines of how much money the players get, it should be selected in a background-invariant way. This seems one of the highly reliable axioms to adopt, so let's delete the background grid.
Pareto Optimality, another essential axiom to have, says that our choice of distinguished point should be on this diagonal line which corresponds to the outcomes where Bob maximizes the sum of utilities instead, and one of the players pays the other.
Redundancy Invariance (that adding actions that can equally well be accomplished with probabilistic mixtures of existing actions does nothing) is a touch less essential but still worth adopting. This implies that our choice of distinguished point should just depend on the convex set, not which points in it are pure actions vs mixtures of other actions.
Maximin Dominance (that both players do better than their maximin payoff) is another essential axiom to have, and it says that our choice of distinguished point should be on this portion of the frontier. Any further down and Bob goes "wait, I can just argmax!" and does that, and any further to left, Alice goes "wait, I can just refuse to pay Bob!".
And that leaves five more axioms remaining, slightly more dubious. Sub-Max and Payoff Dominance, along with Continuity, Threat Resistance, and Action Monotonicity.
Sub-Max (both players get ≤ the maximum utility they could achieve alone without paying each other) mandates the single point where Alice goes "alright Bob, since you could just argmax, how about you maximize surplus instead, and I'll pay you the exact amount you'd need to be ok with doing that instead of argmaxing". But this is an incredibly strong condition and there's no obvious reason why we should desire it.
Payoff Dominance (that if Alice outscores Bob for all possible outcomes, then the net solution should also feature Alice outscoring Bob, and vice-versa) is an odd one when you try to translate what it's saying into geometrical terms. It effectively says that the solution must be in the following range.
Now, that geometrical meaning isn't obvious at all, so here's where it came from. Payoff Dominance as stated, is an oddball because adding and subtracting constants (by Shift-Invariance) shouldn't affect anything, and yet doing that can change whether Alice consistently outscores Bob, or whether the opposite happens. Actually, that's a feature, not a bug. The proper way to wield this axiom is to use Shift Invariance first to set the 0,0 point on one of these diagonal lines, and then invoke Payoff Dominance to go "Alice consistently outscores Bob in all outcomes (or vice-versa), so she/he should get a higher score". And that's where this geometric way of viewing it came from. When restated in geometric terms, the true role of Payoff Dominance seems to be that it's smuggling in the "ooh, diagonal lines are nice" conclusion. I mean, I agree with that conclusion, but it seems in poor form.
And then we get to the really interesting three.
Action Monotonicity, specialized to this particular case, says that if the convex shape is bigger, Bob gets more utility. More about this one later.
Continuity is a fascinating one because it rules out a solution which is otherwise extremely attractive. It's incredibly suspicious that, for the CoCo value, the threat point if negotiations break down is Bob maximizing BobUtility-AliceUtility. In general, if you don't want your foe making threats, you should disregard every threat point where your foe isn't maximizing their own utility. And so, obviously, the TRUE threat point is the one where Bob just maximizes his own utility, and we split surplus 50/50 from there.... right?? But, observe the following violation of continuity.
And so, that brings us to the most special desiderata of all, Threat Resistance. It says that if you close the convex shape to the left and downwards (by adding in the "incinerate utility" buttons), it has no effect on the selected point. Only the Pareto frontier of Bob's actions affect what happens.
Continuity and Threat Resistance, when combined, are really interesting, because they actually make it quite difficult to even specify many of the points on the Pareto frontier in the first place. The obvious solution of "assume Bob will maximize his own utility, and then offer a 50/50 split from there" violates continuity, and most variations on the CoCo solution (where you vary the slope of the tangent line) are certainly continuous, but violate Threat Resistance.
Really, there's only two solutions I could come up with. One of them violated Action Monotonicity for Bob (which is a plausible desideratum since Alice only has one action), and that leaves only one option that remains.
Toy-Case Visual Proof of the ROSE Value
The outcome where Alice goes "ok, Bob. I know you can get 41 dollars worth of value from picking your favorite move. May I interest you in the option where you play that Pareto-Frontier move to maximize the sum of our utilities instead? I commit to paying you if you do so, and paying you enough so that you walk away with 41 dollars and 1 cent worth of value." And then Bob takes it, so they hit the green dot in the below image. Bob is responsible for picking a spot in the green region that's on the Pareto-frontier/purple line, and Alice is responsible for paying him afterwards, moving diagonally up and back to the green spot, which is slightly better for Bob than Bob just directly argmaxing.
That pair of payoffs is the ROSE value of the game.
But is this really the only possible solution? Do we have a theorem that says it's the only solution that fulfills certain axioms? Yes. Does it cheat by using the axiom of ≤ maximum value? No it does not. In fact, I can actually give a slick visual proof of it!
But what axioms are needed? Only five. Maximin Dominance, Threat-Resistance, Redundancy-Invariance, Pareto-Optimality and Action Monotonicity (for Bob). And technically you only need the weak form of Redundancy-Invariance that says that adding duplicate actions does nothing. If you believe the following:
1: Players should get ≥ the amount of utility they can guarantee on their own. After all, if a notion of bargaining says a player should lose, and they can force a win, why should they assent to that bargaining notion?
2: If Alice and Bob are both indifferent between action A and B, then deleting one of them shouldn't affect anything because Bob can just do the other one.
3: If Alice and Bob are doing a suboptimal pair of actions, they should be able to coordinate on doing something which is Not That.
4: Bob is happy to come up with extra actions/unhappy to lose existing actions, since Alice can only act by paying him.
5: If Bob brings a gun to negotiations, and he doesn't intrinsically value shooting Alice, Alice shouldn't react to the gun, lest she incentivize Bob to bring a gun to negotiations.
Then the only possible solution is Alice paying Bob just enough to motivate him to play the Pareto-Frontier/surplus-maximizing move instead of what he'd do otherwise.
Let's get started with that visual proof! Let's say that Bob's only moves were as follows. Bob can press a button that gives both players the ROSE value, or he take any action that gives Alice equal or greater utility than the ROSE value. Those are the only options. The only solution to this game that's maximin-compliant is Bob pressing the ROSE button.
Bob brings a gun which can be used to shoot himself in the foot or shoot Alice. Er, to be more precise, Bob's action space augments from B to B×[−d,0]2, where d is a sufficiently large number. This is how much utility to destroy from Bob and Alice, respectively. By Threat-Resistance, payoffs don't change from what they were, namely, the utilities associated with Bob pressing the ROSE button.
Bob gains access to his missing kit of actions (not paired with the "destroy utility" buttons this time). All of which could have been implemented by just pressing the ROSE button and decreasing utility appropriately, so by Redundancy-Invariance, payoffs don't change from what they were, namely, the utilities associated with Bob pressing the ROSE button.
<