Note that I've changed the term population AGI to collective AGI, to match Bostrom's usage in Superintelligence.
I think there’s a reasonably high probability that we will end up training AGI in a multi-agent setting. But in that case, we shouldn’t just be interested in how intelligent each agent produced by this training process is, but also in the combined intellectual capabilities of a large group of agents. If those agents cooperate, they will exceed the capabilities of any one of them - and then it might be useful to think of the whole collective as one AGI. Arguably, on a large-scale view, this is how we should think of humans. Each individual human is generally intelligent in our own right. Yet from the perspective of chimpanzees, the problem was not that any single human was intelligent enough to take over the world, but rather that millions of humans underwent cultural evolution to make the human collective as a whole much more intelligent.
This idea isn’t just relevant to multi-agent training though: even if we train a single AGI, we will have strong incentives to copy it many times to get it to do more useful work. If that work involves generating new knowledge, then putting copies in contact with each other to share that knowledge would also increase efficiency. And so, one way or another, I expect that we’ll eventually end up dealing with a “collective” of AIs. Let’s call the resulting system, composed of many AIs working together, a collective AGI.
We should be clear about the differences between three possibilities which each involve multiple entities working together:
- A single AGI composed of multiple modules, trained in an end-to-end way.
- The Comprehensive AI Services (CAIS) model of a system of interlinked AIs which work together to complete tasks.
- A collective AGI as described above, consisting of many individual AIs working together in comparable ways to how a collective of humans might collaborate.
This essay will only discuss the third possibility, which differs from the other two in several ways:
- Unlike the modules of a single AGI, the members of a collective AGI are not trained in a centralised way, on a single objective function. Rather, optimisation takes place with respect to the policies of individual members, with cooperation between them emerging (either during training or deployment) because it fits the incentives of individuals.
- Unlike CAIS services and single AGI modules, the members of a collective AGI are fairly homogeneous; they weren’t all trained on totally different tasks (and in fact may start off identical to each other).
- Unlike CAIS services and single AGI modules, the members of a collective AGI are each generally intelligent by themselves - and therefore capable of playing multiple roles in the population AGI, and interacting in flexible ways.
- Unlike CAIS services and single AGI modules, the members of a collective AGI might be individually motivated by arbitrarily large-scale goals.
What are the relevant differences from a safety perspective between this collective-based view and the standard view? Specifically, let’s compare a “collective AGI” to a single AGI which can do just as much intellectual work as the whole collective combined. Here I’m thinking particularly of the most high-level work (such as doing scientific research, or making good strategic decisions), since that seems like a fairer comparison.
We might hope that a collective AGI will be more interpretable than a single AGI, since its members will need to pass information to each other in a standardised “language”. By contrast, the different modules in a single AGI may have developed specialised ways of communicating with each other. In humans, language is much lower-bandwidth than thought. This isn’t a necessary feature of communication, though - members of a population AGI could be allowed to send data between each other at an arbitrarily high rate. Decreasing this communication bandwidth might be a useful way to increase the interpretability of a population AGI.
Regardless of the specific details of how they collaborate and share information, members of a collective AGI will need structures and norms for doing so. There’s a sense in which some of the “work” of solving problems is done by those norms - for example, the structure of a debate can be more or less helpful in adjudicating the claims made. The analogous aspect of a single AGI is the structure of its cognitive modules and how they interact with each other. However, the structure of a collective AGI would be much more flexible - and in particular, it could be redesigned by the collective AGI itself in order to improve the flow of information. By contrast, the modules of a single AGI will have been designed by an optimiser, and so fit together much more rigidly. This likely makes them work together more efficiently; the efficiency of end-to-end optimisation is why a human with a brain twice as large would be much more intelligent than two normal humans collaborating. But the concomitant lack of flexibility is why it’s much easier to improve our coordination protocols than our brain functionality.
Suppose we want to retrain an AGI to have a new set of goals. How easy is this in each case? Well, for a single AGI we can just train it on a new objective function, in the same way we trained it on the old one. For a collective AGI where each of the members was trained individually, however, we may not have good methods for assigning credit when the whole collective is trying to work together towards a single task. For example, a difficulty discussed in Sunehag et al. (2017) is that one agent starting to learn a new skill might interfere with the performance of other agents - and the resulting decrease in reward teaches the first agent to stop attempting the new skill. This would be particularly relevant if the original collective AGI was produced by copying an single agent trained by itself - if so, it’s plausible that multi-agent reinforcement learning techniques have lagged behind.
This is a tricky one. I think that a collective AGI is likely to be less agentic and goal-directed than a single AGI of equivalent intelligence, because different members of the collective may have different goals which push in different directions. However, it’s also possible that collective-level phenomena amplify goal-directed behaviour. For example, competition between different members in a collective AGI could push the group as a whole towards dangerous behaviour (in a similar way to how competition between companies makes humans less safe from the perspective of chimpanzees). And our lessened ability to fine-tune them, as discussed in the previous paragraph, might make it difficult to know how to intervene to prevent that.
Overall evaluation of collective AGIs
I think that the extent to which a collective AGI is more dangerous than an equivalently intelligent single AGI will mainly depend on how the individual members are trained (in ways which I’ve discussed previously). If we condition on a given training regime being used for both approaches, though, it’s much less clear which type of AGI we should prefer. It’d be useful to see more arguments either way - in particular because a better understanding of the pros and cons of each approach might influence our training decisions. For example, during multi-agent training there may be a tradeoff between training individual AIs to be more intelligent, versus running more copies of them to teach them to cooperate at larger scales. In such environments we could also try to encourage or discourage them from in-depth communication with each other.
In my next post, I’ll discuss one argument for why collective AGIs might be safer: because they can be deployed in more constrained ways.