Wiki Contributions

Comments

Yeah, I understand that. My point is that the same way society didn't work by default, systems of AI won't work by default, and that the interventions that will be needed will require AI researchers. That is, it's not just about setting up laws, norms, contracts, and standards for managing these systems. It is about figuring out how to make AI systems which interact with each other in the way that humans do in the presence of laws, norms, standards and contracts. Someone who is not an AI research would have no hope in solving this, since they cannot understand how AI systems will interact, and cannot offer appropriate interventions.

Looking at these, I feel like they are subquestions of "how do you design a good society that can handle technological development" -- most of it is not AI-specific or CAIS-specific.

For me this is the main point of CAIS. It reframes many AI Safety problems in terms of "make a good society" problems, but now you can consider scenarios involving only AI. We can start to answer the question of "how do we make a good society of AIs?" with the question "How did we do it with humans?". It seems like human society did not have great outcomes for everyone by default. Making human society function took a lot of work, and failed a lot of times. Can we learn from that and make AI Society fail less often or less catastrophically?

My point is just that "prior / equilibrium selection problem" is a subset of the "you don't know everything about the other player" problem, which I think you agree with?

I see two problems: one of trying to coordinate on priors, and one of trying to deal with having not successfully coordinated. I think that which is easier depends on the problem: if we're applying it to CAIS, HRI or a multipolar scenario. Sometimes it's easier to coordinate on a prior before hand, sometimes it's easier to be robust to differing priors, and sometimes you have to go for a bit of both. I think it's reasonable to call both solution techniques to the "prior / equilibrium selection problem", but the framings shoot for different solutions, both of which I view as necessary sometimes.


The strategy of agreeing on a joint welfare function is already a heuristic and isn't an optimal strategy; it feels very weird to suppose that initially a heuristic is used and then we suddenly switch to pure optimality.

I don't really know what you mean by this. Specifically I don't know from who's perspective it isn't optimal and under what beliefs.

A few things to point out:

  • The strategy of agreeing on a joint welfare function and optimizing it is an optimal strategy for some belief in infinitely iterated settings (because there is a folk theorem so almost everything is an optimal strategy for some belief)
  • Since we're currently making norms for these interactions, we are currently designing these beliefs. This means that we can make it be the case that having that belief is justified in future deployments.
  • If we want to talk about "optimality" in terms of "equilibria selection procedures" or "coordination norms" we have to have a metric to say some outcomes are "better" than others. This is not a utility function for the agents, but for us as the norm designers. Social welfare seems good for this.
Note that when you can have a well-specified Bayesian belief over your partner, these problems don't arise. However, both agents can't be in this situation: in this case agent A would have a belief over B that has a belief over A; if these are all well-specified Bayesian beliefs, then A has a Bayesian belief over itself, which is impossible.

There are ways to get around this. The most common way in the literature (in fact the only way I have seen) gives every agent a belief over a set of common worlds (which contain both the state of the world and the memory states of all of the agents). Then the state of the world is a sufficient statistic over everything that can happen and beliefs about other players beliefs can be derived from each player's beliefs on the underlying world. This does mean you have to agree upon "possible memory states" before time, or at least both have beliefs that are described over sets that can be constantly combined into a "set of all possible worlds".

I mean, in this case you just deploy one agent instead of two

If the CAIS view multi-agent setups like this could be inevitable. There are also many reasons that we could want a lot of actors making a lot of agents rather than one actor making one agent. By having many agents we have no single point of failure (like fault-tolerant data-storage) and no single principle has a concentration of power (like the bitcoin protocol).

It does introduce more game-theoretic issues, but those issues seem understandable and tractable to me and there is very little work from the AI perspective that seriously tackles them, so the problems could be much easier than we think.

Even under the constraint that you must deploy two agents, you exactly coordinate their priors / which equilibria they fall into. To get prior / equilibrium selection problems, you necessarily need to have agents that don't know who their partner is.

I think it is reasonable to think that there could be a band width constraint on coordination over the prior and equilibria selection, that is much smaller than all of the coordination scenarios you could possibly encounter. I agree to have these selection problems you need to not know who exactly your partner is, but it is possible to know quite a bit about your partner and still have coordination problems.

It encourages solutions that take advantage of optimality and won't actually work in the situations we actually face.

I would be very weary of a solution that didn't work when have optimal agents. I think it's reasonable to try to get things to work when we do everything right before trying to make that process robust to errors

The formality of "priors / equilibria" doesn't have any benefit in this case (there aren't any theorems to be proven). The one benefit I see is that it signals that "no, even if we formalize it, the problem doesn't go away", to those people who think that once formalized sufficiently all problems go away via the magic of Bayesian reasoning.

I think there are theorems to be proven, just not of the form "there is an optimal thing to do"

The strategy of agreeing on a joint welfare function is already a heuristic and isn't an optimal strategy; it feels very weird to suppose that initially a heuristic is used and then we suddenly switch to pure optimality.

It's also, to a first approximation, the strategy society takes in lots of situations, this happens whenever people form teams with a common goal. There are usually processes of re-negotiating the goal, but between these times of conflict people gain a lot of efficiency by working together and punishing deviation.

I second Michael Dennis' comment below, that the infinite regress of priors is avoided in standard game theory by specifying a common prior. Indeed the specification of this prior leads to a prior selection problem.

Just to make sure that I was understood, I was also pointing out that "you can have a well-specified Bayesian belief over your partner" even without agreeing on a common prior, as long as you agree on a common set of possibilities or something effectively similar. This means that talking about "Bayesian agents without a common prior" is well-defined.

When there is not a common prior, this lead to an arbitrarily deep nesting of beliefs, but they are all well-defined. I can refer to "what A believes that B believes about A" without running into Russell's Paradox. When the priors mis-match then the entire hierarchy of these beliefs might be useful to reason about, but when there is a common prior, it allows much of the hierarchy to collapse.