Equilibrium and prior selection problems in multipolar deployment

Planned summary for the Alignment Newsletter:

Consider the scenario in which two principals will separately develop and deploy learning agents, that will then act on their behalf, and suppose further that they even agree on the welfare function that these agents should optimize. Let us call this a _learning game_, in which the "players" are the principals, the actions are the agents developed, and both players want to optimize the welfare function (making it a collaborative game). There still remain two coordination problems. First, we face an _equilibrium selection problem_: there can be multiple Nash equilibria in a collaborative game, and so if the two deployed learning agents are Nash strategies from _different_ equilibria, payoffs can be arbitrarily bad. Second, we face a _prior selection problem_: given that there are many reasonable priors that the learning agents could have, if they end up with different priors from each other, outcomes can again be quite bad, especially in the context of <@threats@>(@Cooperation, Conflict, and Transformative Artificial Intelligence: A Research Agenda@).

Planned opinion:

These are indeed pretty hard problems in any collaborative game. While this post takes the framing of considering optimal principals and/or agents (and so considers Bayesian strategies in which only the prior and choice of equilibrium are free variables), I prefer the framing taken in <@our paper@>(@Collaborating with Humans Requires Understanding Them@): the issue is primarily that in a collaborative game, the optimal thing for you to do depends strongly on who your partner is, but you may not have a good understanding of who your partner is, and if you're wrong you can do arbitrarily poorly.

Note that when you can have a well-specified Bayesian belief over your partner, these problems don't arise. However, both agents can't be in this situation: in this case agent A would have a belief over B that has a belief over A; if these are all well-specified Bayesian beliefs, then A has a Bayesian belief over itself, which is impossible.

[-]MichaelDennis6y30

Note that when you can have a well-specified Bayesian belief over your partner, these problems don't arise. However, both agents can't be in this situation: in this case agent A would have a belief over B that has a belief over A; if these are all well-specified Bayesian beliefs, then A has a Bayesian belief over itself, which is impossible.

There are ways to get around this. The most common way in the literature (in fact the only way I have seen) gives every agent a belief over a set of common worlds (which contain both the state of the world and the memory states of all of the agents). Then the state of the world is a sufficient statistic over everything that can happen and beliefs about other players beliefs can be derived from each player's beliefs on the underlying world. This does mean you have to agree upon "possible memory states" before time, or at least both have beliefs that are described over sets that can be constantly combined into a "set of all possible worlds".

[-]Rohin Shah6y20

Thanks, removed that section.

[-]JesseClifton6y10

both players want to optimize the welfare function (making it a collaborative game)

The game is collaborative in the sense that a welfare function is optimized in equilibrium, but the principals will in general have different terminal goals (reward functions) and the equilibrium will be enforced with punishments (cf. tit-for-tat).

the issue is primarily that in a collaborative game, the optimal thing for you to do depends strongly on who your partner is, but you may not have a good understanding of who your partner is, and if you're wrong you can do arbitrarily poorly

Agreed, but there's the additional point that in the case of principals designing AI agents, the principals can (in theory) coordinate to ensure that the agents "know who their partner is". That is, they can coordinate on critical game-theoretic parameters of their respective agents.

[-]Rohin Shah6y20

Ah, I misunderstood your post. I thought you were arguing for problems conditional on the principals agreeing on the welfare function to be optimized, and having common knowledge that they were designing agents that optimize that welfare function.

but there's the additional point that in the case of principals designing AI agents, the principals can (in theory) coordinate to ensure that the agents "know who their partner is".

I mean, in this case you just deploy one agent instead of two. Even under the constraint that you must deploy two agents, you exactly coordinate their priors / which equilibria they fall into. To get prior / equilibrium selection problems, you necessarily need to have agents that don't know who their partner is. (Even if just one agent knows who the partner is, outcomes should be expected to be relatively good, though not optimal, e.g. if everything is deterministic, then threats are never executed.)

----

Looking at these objections, I think probably what you were imagining is a game where the principals have different terminal goals, but they coordinate by doing the following:

Agreeing upon a joint welfare function that is "fair" to the principals. In particular, this means that they agree that they are "licensed" to punish actions that deviate from this welfare function.
Going off and building their own agents that optimize the welfare function, but make sure to punish deviations (to ensure that the other principal doesn't build an agent that pursues the principal's goals instead of the welfare function)

New planned summary:

Consider the scenario in which two principals with different terminal goals will separately develop and deploy learning agents, that will then act on their behalf. Let us call this a _learning game_, in which the "players" are the principals, and the actions are the agents developed.

One strategy for this game is for the principals to first agree on a "fair" joint welfare function, such that they and their agents are then licensed to punish the other agent if they take actions that deviate from this welfare function. Ideally, this would lead to the agents jointly optimizing the welfare function (while being on the lookout for defection).

There still remain two coordination problems. First, there is an _equilibrium selection problem_: if the two deployed learning agents are Nash strategies from _different_ equilibria, payoffs can be arbitrarily bad. Second, there is a _prior selection problem_: given that there are many reasonable priors that the learning agents could have, if they end up with different priors from each other, outcomes can again be quite bad, especially in the context of <@threats@>(@Cooperation, Conflict, and Transformative Artificial Intelligence: A Research Agenda@).

New opinion:

These are indeed pretty hard problems in any non-competitive game. While this post takes the framing of considering optimal principals and/or agents (and so considers Bayesian strategies in which only the prior and choice of equilibrium are free variables), I prefer the framing taken in <@our paper@>(@Collaborating with Humans Requires Understanding Them@): the issue is primarily that the optimal thing for you to do depends strongly on who your partner is, but you may not have a good understanding of who your partner is, and if you're wrong you can do arbitrarily poorly.

Note that when you can have a well-specified Bayesian belief over your partner, these problems don't arise. However, both agents can't be in this situation: in this case agent A would have a belief over B that has a belief over A; if these are all well-specified Bayesian beliefs, then A has a Bayesian belief over itself, which is usually impossible.

Btw, some reasons I prefer not using priors / equilibria and instead prefer just saying "you don't know who your partner is":

It encourages solutions that take advantage of optimality and won't actually work in the situations we actually face.
The formality of "priors / equilibria" doesn't have any benefit in this case (there aren't any theorems to be proven). The one benefit I see is that it signals that "no, even if we formalize it, the problem doesn't go away", to those people who think that once formalized sufficiently all problems go away via the magic of Bayesian reasoning.
The strategy of agreeing on a joint welfare function is already a heuristic and isn't an optimal strategy; it feels very weird to suppose that initially a heuristic is used and then we suddenly switch to pure optimality.

[-]MichaelDennis6y10

I mean, in this case you just deploy one agent instead of two

If the CAIS view multi-agent setups like this could be inevitable. There are also many reasons that we could want a lot of actors making a lot of agents rather than one actor making one agent. By having many agents we have no single point of failure (like fault-tolerant data-storage) and no single principle has a concentration of power (like the bitcoin protocol).

It does introduce more game-theoretic issues, but those issues seem understandable and tractable to me and there is very little work from the AI perspective that seriously tackles them, so the problems could be much easier than we think.

Even under the constraint that you must deploy two agents, you exactly coordinate their priors / which equilibria they fall into. To get prior / equilibrium selection problems, you necessarily need to have agents that don't know who their partner is.

I think it is reasonable to think that there could be a band width constraint on coordination over the prior and equilibria selection, that is much smaller than all of the coordination scenarios you could possibly encounter. I agree to have these selection problems you need to not know who exactly your partner is, but it is possible to know quite a bit about your partner and still have coordination problems.

It encourages solutions that take advantage of optimality and won't actually work in the situations we actually face.

I would be very weary of a solution that didn't work when have optimal agents. I think it's reasonable to try to get things to work when we do everything right before trying to make that process robust to errors

The formality of "priors / equilibria" doesn't have any benefit in this case (there aren't any theorems to be proven). The one benefit I see is that it signals that "no, even if we formalize it, the problem doesn't go away", to those people who think that once formalized sufficiently all problems go away via the magic of Bayesian reasoning.

I think there are theorems to be proven, just not of the form "there is an optimal thing to do"

The strategy of agreeing on a joint welfare function is already a heuristic and isn't an optimal strategy; it feels very weird to suppose that initially a heuristic is used and then we suddenly switch to pure optimality.

It's also, to a first approximation, the strategy society takes in lots of situations, this happens whenever people form teams with a common goal. There are usually processes of re-negotiating the goal, but between these times of conflict people gain a lot of efficiency by working together and punishing deviation.

[-]Rohin Shah6y20

I think there are theorems to be proven, just not of the form "there is an optimal thing to do"

I meant one thing and wrote another; I just meant to say that there weren't theorems in this post.

If the CAIS view multi-agent setups like this could be inevitable.

My point is just that "prior / equilibrium selection problem" is a subset of the "you don't know everything about the other player" problem, which I think you agree with?

It's also, to a first approximation, the strategy society takes in lots of situations, this happens whenever people form teams with a common goal. There are usually processes of re-negotiating the goal, but between these times of conflict people gain a lot of efficiency by working together and punishing deviation.

I'm not sure how this relates to the thing I'm saying (I'm also not sure if I understood it).

[-]MichaelDennis6y10

My point is just that "prior / equilibrium selection problem" is a subset of the "you don't know everything about the other player" problem, which I think you agree with?

I see two problems: one of trying to coordinate on priors, and one of trying to deal with having not successfully coordinated. I think that which is easier depends on the problem: if we're applying it to CAIS, HRI or a multipolar scenario. Sometimes it's easier to coordinate on a prior before hand, sometimes it's easier to be robust to differing priors, and sometimes you have to go for a bit of both. I think it's reasonable to call both solution techniques to the "prior / equilibrium selection problem", but the framings shoot for different solutions, both of which I view as necessary sometimes.

The strategy of agreeing on a joint welfare function is already a heuristic and isn't an optimal strategy; it feels very weird to suppose that initially a heuristic is used and then we suddenly switch to pure optimality.

I don't really know what you mean by this. Specifically I don't know from who's perspective it isn't optimal and under what beliefs.

A few things to point out:

The strategy of agreeing on a joint welfare function and optimizing it is an optimal strategy for some belief in infinitely iterated settings (because there is a folk theorem so almost everything is an optimal strategy for some belief)
Since we're currently making norms for these interactions, we are currently designing these beliefs. This means that we can make it be the case that having that belief is justified in future deployments.
If we want to talk about "optimality" in terms of "equilibria selection procedures" or "coordination norms" we have to have a metric to say some outcomes are "better" than others. This is not a utility function for the agents, but for us as the norm designers. Social welfare seems good for this.

[-]JesseClifton6y10

The new summary looks good =) Although I second Michael Dennis' comment below, that the infinite regress of priors is avoided in standard game theory by specifying a common prior. Indeed the specification of this prior leads to a prior selection problem.

The formality of "priors / equilibria" doesn't have any benefit in this case (there aren't any theorems to be proven)

I’m not sure if you mean “there aren’t any theorems to be proven” or “any theorem that’s proven in this framework would be useless”. The former is false, e.g. there are things to prove about the construction of learning equilibria in various settings. I’m sympathetic with the latter criticism, though my own intuition is that working with the formalism will help uncover practically useful methods for promoting cooperation, and point to problems that might not be obvious otherwise. I'm trying to make progress in this direction in this paper, though I wouldn't yet call this practical.

The one benefit I see is that it signals that "no, even if we formalize it, the problem doesn't go away", to those people who think that once formalized sufficiently all problems go away via the magic of Bayesian reasoning

Yes, this is a major benefit I have in mind!

The strategy of agreeing on a joint welfare function is already a heuristic and isn't an optimal strategy; it feels very weird to suppose that initially a heuristic is used and then we suddenly switch to pure optimality

I’m not sure what you mean by “heuristic” or “optimality” here. I don’t know of any good notion of optimality which is independent of the other players, which is why there is an equilibrium selection problem. The welfare function selects among the many equilibria (i.e. it selects one which optimizes the welfare). I wouldn't call this a heuristic. There has to be some way to select among equilibria, and the welfare function is chosen such that the resulting equilibrium is acceptable by each of the principals' lights.

[-]Rohin Shah6y30

I’m not sure what you mean by “heuristic” or “optimality” here. I don’t know of any good notion of optimality which is independent of the other players, which is why there is an equilibrium selection problem.

I think once you settle on a "simple" welfare function, it is possible that there are _no_ Nash equilibria such that the agents are optimizing that welfare function (I don't even really know what it means to optimize the welfare function, given that you have to also punish the opponent, which isn't an action that is useful for the welfare function).

I’m not sure if you mean “there aren’t any theorems to be proven” or “any theorem that’s proven in this framework would be useless”.

Hmm, I meant one thing and wrote another. I meant to say "there aren't any theorems proven in this post".

[-]MichaelDennis6y10

I second Michael Dennis' comment below, that the infinite regress of priors is avoided in standard game theory by specifying a common prior. Indeed the specification of this prior leads to a prior selection problem.

Just to make sure that I was understood, I was also pointing out that "you can have a well-specified Bayesian belief over your partner" even without agreeing on a common prior, as long as you agree on a common set of possibilities or something effectively similar. This means that talking about "Bayesian agents without a common prior" is well-defined.

When there is not a common prior, this lead to an arbitrarily deep nesting of beliefs, but they are all well-defined. I can refer to "what A believes that B believes about A" without running into Russell's Paradox. When the priors mis-match then the entire hierarchy of these beliefs might be useful to reason about, but when there is a common prior, it allows much of the hierarchy to collapse.

Actually, the problem is more general than that. The agents might not only have disagreeing priors, but model their strategic interaction using different games entirely. I hope to address this in a later post. For simplicity I'll focus on the special case of priors here. Also, see the literature on "hypergames" (e.g. Bennett, P.G., 1980. Hypergames: developing a model of conflict), which describe agents who have different models of the game they're playing. ↩︎
Compare with the literature on misperception in international relations, and how misperceptions can lead to disaster in human interaction. Many instances of misperception might be modeled as "incorrect beliefs about others' priors''. Compare also with the discussion of crisis bargaining under incomplete information in Section 4.1 here. ↩︎
I set aside the problem of truthfully eliciting each player's utility function. ↩︎
Cf. this CHAI paper, which makes a related point in the context of human-AI interaction. However, they say that we can't expect an AI trained to play an equilibrium strategy in self-play to perform well against a human, because humans might play off-equilibrium (seeing as humans are "suboptimal''). But the problem is not just that one of the players might play off-equilibrium. It's that even if they are both playing an equilibrium strategy, they may have selected different equilibria. ↩︎

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

11

Equilibrium and prior selection problems in multipolar deployment

11

A learning game model of multipolar AI deployment

The equilibrium selection problem

The prior selection problem

Acknowledgements