Comments on Allan Dafoe on AI Governance

Alex Flint

Financial status: This is independent research, now supported by a grant.

Epistemic status: Views here are almost entirely my own.

There are some think pieces that lay out a bunch of perspectives with which we might think about a thing. This can be either terrible or excellent. At the terrible end, there are certain philosophical styles that describe endless possible views that one might take without actually saying anything real. One can draw out endless matrices of possible conjunctions and give names to them and describe their implications without actually making a point. But on the excellent end, when we have been using one perspective for a long time and have started taking it for granted then it can be helpful to give a name to that perspective and to think through some other perspectives, if only so that we can be sure that we are using this particular perspective for good reasons rather than out of habit.

And sometimes there really are good reasons to use a single perspective over and over. A physicist might model interactions between billiard balls using the perspective of elastic collisions. This might well be an excellent perspective to use, and the physicist might know this, and might keep choosing to use the same perspective over and over for well-calibrated reasons. Or one might choose to view the real-world phenomenon of machines that maintain credences in their beliefs about the world through the perspective of probability theory. For many jobs, probability theory is really an excellent choice of perspective, and it might be fine to use it over and over, especially if that choice is made in awareness that it is, in fact, a choice.

In AI Governance: Opportunity and Theory of Impact, Allan Dafoe asks how we should understand what AI actually is in a way that is conducive to effective oversight by our institutions. Dafoe describes the "Superintelligence perspective" like this:

Many longtermists come to the field of AI Governance from what we can call the superintelligence perspective, which typically focuses on the challenge of having an AI agent with cognitive capabilities vastly superior to those of humans. Given how important intelligence is---to the solving of our global problems, to the production and allocation of wealth, and to military power---this perspective makes clear that superintelligent AI would pose profound opportunities and risks. [...]. The superintelligence perspective is well developed in Nick Bostrom’s Superintelligence, Eliezer Yudkowsky’s writings (eg), Max Tegmark’s Life 3.0, and Stuart Russell’s Human Compatible.

Identifying a perspective and giving it a name need not automatically cast doubt on it, just as giving a name to the perspective of "elastic collisions" as a way of understanding billiard balls need not stop us from choosing that perspective where it is helpful. Dafoe goes on to describe two other perspectives for thinking about AI. First:

the AI ecology perspective: instead of imagining just one or several superintelligent agents vastly superior to all other agents, we can imagine a diverse, global, ecology of AI systems. Some may be like agents, but others may be more like complex services, systems, or corporations. [...]. Hanson’s Age of Em describes one such world, where biological humans have been economically displaced by evolved machine agents, who exist in a Malthusian state; there was no discrete event when superintelligence took over. Drexler’s Comprehensive AI Services offers an ecological/services perspective on the future of AI, arguing that we are more likely to see many superhuman but narrow AI services

And second:

Another, broadly mainstream, perspective regards AI as a general purpose technology (GPT), in some ways analogous to other GPTs like steam-power, electricity, or computers (the GPT perspective). Here we need not emphasize only agent-like AI or powerful AI systems, but instead can examine the many ways even mundane AI could transform fundamental parameters in our social, military, economic, and political systems, from developments in sensor technology, digitally mediated behavior, and robotics

Many in our community seem to be working on how to frame what AI is so as to facilitate a helpful response from our institutions. We seem to have started out with an agent-centric view, as per Eliezer’s writings and also Bostrom’s Superintelligence. Then, starting perhaps with Scott and Abram’s work on Embedded Agency, or perhaps with Drexler’s Comprehensive AI Services, there have been efforts to find an appropriate broadening that leaves us with something that is still amenable to analysis. In AI Research Considerations for Human Existential Safety, Critch proposes a broadening in terms of the number of humans and the number of AI systems involved in delegation of power:

Human/AI delegation becomes more complex as the number of humans or AI systems increases. We therefore adopt the following terminology for indicating the number of human stakeholders and AI systems in a human/AI delegation scenario. The number of humans is always indicated first; as a mnemonic, remember that humans come before AI: in history, and in importance!

Single/single delegation means delegation from a single human stakeholder to a single AI system.
Single/multi delegation means delegation from a single human stakeholder to multiple AI systems.
Multi/single delegation means delegation from multiple human stakeholders to a single AI system.
Multi/multi delegation means delegation from multiple human stakeholders to multiple AI systems.

I have also tried my hand in this helpful-broadening game. In AI Risk for Epistemic Minimalists I proposed that we focus on systems that exert influence over the future independent of human building-blocks. Such systems are actually non-existent today (except, I claim, for human beings), but if and when such artificial systems are built, they may or may not look much like agents.

Instrumental and epistemic rationality

I think that framing what exactly AI is in a way that elicits a helpful response from our society is extremely interesting, because it we are engaging with epistemic and instrumental rationality at the same time -- epistemic insofar as we are picking a frame that corresponds to reality, and instrumental insofar as we picking a frame which, if taken up by our society, elicits a helpful response. I do not think we can do epistemics first and instrumentals second because there are just so many frames that correspond to reality, or that meaningfully compress anticipated sense data, because information about reality is abundant. This epistemic criterion must not be neglected when we select a frame, yet it is not on its own enough to identify which frame to work with, not even close. It seems to me that the instrumental criterion -- that our frames should elicit a helpful response upon being taken up -- is actually just as central as the epistemic criterion.

Now you might say that we should form true beliefs first, and then, given a correct model of the world, decide how to communicate. But the problem is that then we will use most of our internal capacity for private models, and communicate only a tiny sliver of our understanding that we think is likely to elicit the response that we wish for from those we are communicating with. Our peers, however, have about the same overall cognitive capacity that we do, so if we communicate only a tiny sliver of our own understanding, then that will constitute only a tiny sliver of our peers’ understanding, too, since they will necessarily fill the remainder of their own capacity with something. Hence this "epistemics first, instrumentals second" strategy severely restricts the extent to which we can be "on the same page" with our peers, since we are holding back most of our understanding.

The alternative is to completely fill our own minds with the kind of models, which, if taken up by all, would bring forth a society capable of responding intelligently to the existential risks that face us. Then, given this, we simply communicate all of our understanding to others, knowing that everything in our minds is of a form that has been optimized for sharing. But this entails holding nothing back, both when we choose how to see the world, and when we share that understanding with others. It is no longer possible to imagine that there is a virtuous consequentialist outside of it all, forming beliefs on the basis of the epistemic criterion alone, and then making communication decisions on the basis of the instrumental criterion.

In fact there never was such a consequentialist in the first place -- perhaps there was an approximation of one, but at the root we were embedded agents, not Cartesian agents, all along. Embedded agents are subject to finite computation and storage constraints, which makes perfect consequentialism intractable. When dealing with an inert environment, embedded agents may be able to approximate consequentialist behavior quite closely, but when dealing with other agents of similar cognitive capacity, it seems that this consequentialist approximation breaks down. I believe this is at the root of the "weirdness" that we sometimes perceive in open source game theory: we are trying to reason from a consequentialist perspective in exactly the context that is least hospitable to this perspective, namely the one where your choice of internal states and algorithms affects the world.

Coming back to the breakdown of epistemic and instrumental rationality distinction, there is an even deeper reason not to do epistemics first and then instrumentals second, or equivalently to form true beliefs first and then strategically communicate based on those beliefs. This reason is that we observe a huge amount of information over the course of our lives, yet can only store a miniscule fraction of it in our finite brains. We are forced by the laws of physics to discard most of the information we interact with, since every time we store something, we must simultaneously discard exactly that much information also. We can compress information to some extent, but even with perfect compression we wouldn’t be able to retain more than a tiny fraction of all of all the information we’ve encountered. Therefore it behooves us to retain the information that is useful in service of goals, and in this way the instrumental criterion gets involved right at the start of knowledge formation. If not for the instrumental criterion then why would we create any models at all, rather than just store all of our raw experiences? If we just wanted beliefs to be correct, and didn’t care about whether they would usefully inform our actions, then the best way to achieve that would be to simply store our raw sense data, or some fraction of. We can have very high confidence in beliefs of the form "at this time I experienced this", so on what basis would we form any beliefs not of this form? The answer is that the instrumental criterion guides us from the start.

Now you might say that the instrumental criterion guides us in which propositions to entertain, and the epistemic criterion guides us in which probabilities to assign to those propositions, but I think this is still too simplistic. Many of our deepest and most instrumentally useful beliefs are not very well expressed in terms of probabilities at all. For example, many of us have a deep conviction that it is important to expose one’s beliefs to the test of prediction, or that it is important to iterate and get feedback when doing practical work, or that it is good to understand the problem before proposing solutions. We could assign probabilities to these maxims, but doing so just wouldn’t help us all that much, because these maxims are not best understood as claims about the structure of the world, but are more like helpful stories based on our experience of trying to get things done. Of course we refine these maxims in light of new evidence, but that refinement is not really about updating probabilities, or so it seems to me. Therefore I don’t think that this blurring of epistemic and instrumental rationality can be understood as merely selecting which propositions to entertain, and then assigning credences to those propositions. It is more that we are selecting these lossy compressions of our experience on the basis of the epistemic and instrumental criterions jointly.

Conclusion

This essay began as commentary on AI governance, and has wound up in a discussion of embedded agency and knowledge, which is unsurprising since understanding the accumulation of knowledge is my main research project at the moment. Yet AI governance and embedded agency are actually deeply linked: in AI governance we face this question of how to lead our society towards a shared understanding of AI that elicits a response that helpfully mitigated existential risk. But it is very difficult to lead our society at all if we maintain this conceit of a virtuous consequentialist existing outside of it all, forming abstract beliefs and pulling strings. Investigating this does, in fact, lead us right to the heart of embedded agency and the issue of knowledge accumulation in finite minds. Hence we see the same thing that we’ve been seeing everywhere: that the more we investigate the nature of external intelligent systems, the more clearly we see that resolving the biggest problems facing our society requires a radical internal shift in our own minds.

AI ALIGNMENT FORUM
AF

Comments on Allan Dafoe on AI Governance

9

Instrumental and epistemic rationality

Conclusion

9