(Update 2022: Enjoy the post, but note that it’s old, has some errors, and is certainly not reflective of my current thinking. –Steve)

Low confidence; offering this up for discussion

An Oracle AI is an AI that only answers questions, and doesn't take any other actions. The opposite of an Oracle AI is an Agent AI, which might also send emails, control actuators, etc.

I'm especially excited about the possibility of non-self-improving oracle AIs, dubbed Tool AI in a 2012 article by Holden Karnofsky.

I've seen two arguments against this "Tool AI":

  • First, as in Eliezer's 2012 response to Holden, we don't know how to safely make and operate an oracle AGI (just like every other type of AGI). Fair enough! I never said this is an easy solution to all our problems! (But see my separate post for why I'm thinking about this.)
  • Second, as in Gwern's 2016 essay, there's a coordination problem. Even if we could build a safe oracle AGI, the argument goes, there will still be an economic incentive to build an agent AGI, because you can do more and better and faster by empowering the AGI to take actions. Thus, agreeing to never ever build agent AGIs is a very hard coordination problem for society. I don't find the coordination argument compelling—in fact, I think it's backwards—and I wrote this post to explain why.

Five reasons I don't believe the coordination / competitiveness argument against oracles

1. If the oracle isn't smart or powerful enough for our needs, we can solve that by bootstrapping. Even if the oracle is not inherently self-modifying, we can ask it for advice and do human-in-the-loop modifications to make more powerful successor oracles. By the same token, we can ask an oracle AGI for advice about how to design a safe agent AGI.

2. Avoiding coordination problems is a pipe dream; we need to solve the coordination problem at some point, and that point might as well be at the oracle stage. As far as I can tell, we will never get to a stage where we know how to build safe AGIs and where there is no possibility of making more-powerful-and-less-safe AGIs. If we have a goal in the world that we really really want to happen, a low-impact agent is going to be less effective than a not-impact-restrained agent; an act-based agent is going to be less effective than a goal-seeking agent;[1] and so on and so forth. It seems likely that, no matter how powerful a safe AGI we can make, there will always be an incentive for people to try experimenting with even more powerful unsafe alternative designs.

Therefore, at some point in AI development, we have to blow the whistle, declare that technical solutions aren't enough, and we need to start relying 100% on actually solving the coordination problem. When is that point? Hopefully far enough along that we realize the benefits of AGI for humanity—automating the development of new technology to help solve problems, dramatically improving our ability to think clearly and foresightedly about our decisions, and so on. Oracles can do all that! So why not just stop when we get to AGI oracles?

Indeed, once I started thinking along those lines, I actually see the coordination argument going in the other direction! I say restricting ourselves to oracle AI make coordination easier, not harder! Why is that? Two more reasons:

3. We want a high technological barrier between us and the most dangerous systems: These days, I don't think anyone takes seriously the idea of building an all-powerful benevolent dictator AGI implementing CEV. [ETA: If you do take that idea seriously, see point 1 above on bootstrapping.] At least as far as I can tell from the public discourse, there seems to be a growing consensus that humans should always and forever be in the loop of AGIs. (That certainly sounds like a good idea to me!) Thus, the biggest coordination problem we face is: "Don't ever make a human-out-of-the-loop free-roaming AGI world-optimizer." This is made easier by having a high technological barrier between the safe AGIs that we are building and using, and the free-roaming AGI world-optimizers that we are forbidding. If we make an agent AGI—whether corrigible, aligned, norm-following, low-impact, or whatever—I just don't see any technological barrier there. It seems like it would be trivial for a rogue employee to tweak such an AGI to stop asking permission, deactivate the self-restraint code, and go tile the universe with hedonium at all costs (or whatever that rogue employee happens to value). By contrast, if we stop when we get to oracle AI, it seems like there would be a higher technological barrier to turning it into a free-roaming AGI world-optimizer—probably not that high a barrier, but higher than the alternatives. (The height of this technological barrier, and indeed whether there's a barrier at all, is hard to say.... It probably depends on how exactly the oracles are constructed and access-controlled.)

4. We want a bright-line, verifiable rule between us and the most dangerous systems: Even more importantly, take the rule:

"AGIs are not allowed to do anything except output pixels onto a screen."

This is a nice, simple, bright-line rule, which moreover has at least a chance of being verifiable by external auditors. By contrast, if we try to draw a line through the universe of agent AGIs, defining how low-impact is low-impact enough, how act-based is act-based enough, and so on, it seems to me like it would inevitably be a complicated, blurry, and unenforceable line. This would make a very hard coordination problem very much harder still.

[Clarifications on this rule: (A) I'm not saying this rule would be easy to enforce (globally and forever), only that it would be less hard than alternatives; (B) I'm not saying that, if we enforce this rule, we are free and clear of all possible existential risks, but rather that this would be a very helpful ingredient along with other control and governance measures; (C) Again, I'm presupposing here that we succeed in making superintelligent AI oracles that always give honest and non-manipulative answers; (D) I'm not saying we should outlaw all AI agents, just that we should outlaw world-modeling AGI agents. Narrow-AI robots and automated systems are fine. (I'm not sure exactly how that line would be drawn.)]

Finally, one more thing:

5. Maybe superintelligent oracle AGI is "a solution built to last (at most) until all contemporary thinking about AI has been thoroughly obsoleted...I don’t think there is a strong case for thinking much further ahead than that." (copying from this Paul Christiano post). I hate this argument. It's a cop-out. It's an excuse to recklessly plow forward with no plan and everything at stake. But I have to admit, it seems to have a kernel of truth...


  1. See Paul's research agenda FAQ section 0.1 for things that act-based agents are unlikely to be able to do. ↩︎

New Comment
11 comments, sorted by Click to highlight new comments since: Today at 7:35 AM
Even more importantly, take the rule:
"AGIs are not allowed to do anything except output pixels onto a screen."
This is a nice, simple, bright-line rule,

It is a bright line in one sense, but it has the problem that humans remaining technically in the loop may not make much of a difference in practice. From "Disjunctive Scenarios of Catastrophic AI Risk":

Even if humans were technically kept in the loop, they might not have the time, opportunity, motivation, intelligence, or confidence to verify the advice given by an AI. This would particularly be the case after the AI had functioned for a while, and established a reputation as trustworthy. It may become common practice to act automatically on the AI’s recommendations, and it may become increasingly difficult to challenge the “authority” of the recommendations. Eventually, the AI may in effect begin to dictate decisions (Friedman & Kahn 1992).

Likewise, Bostrom and Yudkowsky (2014) point out that modern bureaucrats often follow established procedures to the letter, rather than exercising their own judgment and allowing themselves to be blamed for any mistakes that follow. Dutifully following all the recommendations of an AI system would be another way of avoiding blame.

O’Neil (2016) documents a number of situations in which modern-day machine learning is used to make substantive decisions, even though the exact models behind those decisions may be trade secrets or otherwise hidden from outside critique. Among other examples, such models have been used to fire school teachers that the systems classified as underperforming and give harsher sentences to criminals that a model predicted to have a high risk of reoffending. In some cases, people have been skeptical of the results of the systems, and even identified plausible reasons why their results might be wrong, but still went along with their authority as long as it could not be definitely shown that the models were erroneous.

In the military domain, Wallach & Allen (2013) note the existence of robots which attempt to automatically detect the locations of hostile snipers and to point them out to soldiers. To the extent that these soldiers have come to trust the robots, they could be seen as carrying out the robots’ orders. Eventually, equipping the robot with its own weapons would merely dispense with the formality of needing to have a human to pull the trigger.

You made a lot of points, so I'll be relatively brief in addressing each of them. (Taking at face value your assertion that your main goal is to start a discussion.)

1. It's interesting to consider what it would mean for an Oracle AI to be good enough to answer extremely technical questions requiring reasoning about not-yet-invented technology, yet still "not powerful enough for our needs". It seems like if we have something that we're calling an Oracle AI in the first place, it's already pretty good. In which case, it was getting to that point that was hard, not whatever comes next.

2. If you actually could make an Oracle that isn't secretly an Agent, then sure, leveraging a True Oracle AI would help us figure out the general coordination problem, and any other problem. That seems to be glossing over the fact that building an Oracle that isn't secretly an Agent isn't actually something we know how to go about doing. Solving the "make-an-AI-that-is-actually-an-Oracle-and-not-secretly-an-Agent Problem" seems just as hard as all the other problems.

3. I ... sure hope somebody is taking seriously the idea of a dictator AI running CEV, because I don't see anything other than that as a stable ("final") equilibrium. There are good arguments that a singleton is the only really stable outcome. All other circumstances will be transitory, on the way to that singleton. Even if we all get Neuralink implants tapping into our own private Oracles, how long does that status quo last? There is no reason for the answer to be "forever", or even "an especially long time", when the capabilities of an unconstrained Agent AI will essentially always surpass those of an Oracle-human synthesis.

4. If the Oracle isn't allowed to do anything other than change pixels on the screen, then of course it will do nothing at all, because it needs to be able to change the voltages in its transistors, and the local EM field around the monitor, and the synaptic firings of the person reading the monitor as they react to the text ... Bright lines are things that exist in the map, not the territory.

5. I'm emotionally sympathetic to the notion that we should be pursuing Oracle AI as an option because the notion of a genie is naturally simple and makes us feel empowered, relative to the other options. But I think the reason why e.g. Christiano dismisses Oracle AI is that it's not a concept that really coheres beyond the level of verbal arguments. Start thinking about how to build the architecture of an Oracle at the level of algorithms and/or physics and the verbal arguments fall apart. At least, that's what I've found, as somebody who originally really wanted this to work out.

Thanks, this is really helpful! For 1,2,4, this whole post is assuming, not arguing, that we will solve the technical problem of making safe and capable AI oracles that are not motivated to escape the box, give manipulative answers, send out radio signals with their RAM, etc. I was not making the argument that this technical problem is easy ... I was not even arguing that it's less hard than building a safe AI agent! Instead, I'm trying to counter the argument that we shouldn't even bother trying to solve the technical problem of making safe AI oracles, because oracles are uncompetitive.

...That said, I do happen to think there are paths to making safe oracles that don't translate into paths to making safe agents (see Self-supervised learning and AGI safety), though I don't have terribly high confidence in that.

Can you find a link to where "Christiano dismisses Oracle AI"? I'm surprised that he has done that. After all, he coauthored "AI Safety via Debate", which seems to addressed primarily (maybe even exclusively) at building oracles (question-answering systems). Your answer to (3) is enlightening, thank you, and do you have any sense for how widespread this view is and where it's argued? (I edited the post to add that people going for benevolent dictator CEV AGI agents should still endorse oracle research because of the bootstrapping argument.)

Regarding the comment about Christiano, I was just referring to your quote in the last paragraph, and it seems like I misunderstood the context. Whoops.

Regarding the idea of a singleton, I mainly remember the arguments from Bostrom's Superintelligence book and can't quote directly. He summarizes some of the arguments here.


when the capabilities of an unconstrained Agent AI will essentially always surpass those of an Oracle-human synthesis.

Nitpick: the capabilities of either a) unconstrained Agent AI/s, or b) Artificial Agent-human synthesis, will essentially always surpass those of an Oracle-human synthesis. We might have to work our way up to AIs without humans being more effective.

At least as far as I can tell from the public discourse, there seems to be a growing consensus that humans should always and forever be in the loop of AGIs.

Maybe; but there also seems to be a general consensus that humans should be kept in the loop when doing any important decisions in general; yet there are also powerful incentives pushing various actors to automate their modern-day autonomous systems. In particular, there are cases where not having a human in the loop is an advantage by itself, because it e.g. buys you a faster reaction time (see high-frequency trading).

From "Disjunctive Scenarios of Catastrophic AI Risk":



The historical trend has been to automate everything that can be automated, both to reduce costs and because machines can do things better than humans can. Any kind of a business could potentially run better if it were run by a mind that had been custom-built for running the business—up to and including the replacement of all the workers with one or more with such minds. An AI can think faster and smarter, deal with more information at once, and work for a unified purpose rather than have its efficiency weakened by the kinds of office politics that plague any large organization. Some estimates already suggest that half of the tasks that people are paid to do are susceptible to being automated using techniques from modern-day machine learning and robotics, even without postulating AIs with general intelligence (Frey & Osborne 2013, Manyika et al. 2017).

The trend toward automation has been going on throughout history, doesn’t show any signs of stopping, and inherently involves giving the AI systems whatever agency they need in order to run the company better. There is a risk that AI systems that were initially simple and of limited intelligence would gradually gain increasing power and responsibilities as they learned and were upgraded, until large parts of society were under AI control. [...]

[Deploying autonomous AI systems] can happen in two forms: either by expanding the amount of control that already-existing systems have, or alternatively by upgrading existing systems or adding new ones with previously-unseen capabilities. These two forms can blend into each other. If humans previously carried out some functions which are then given over to an upgraded AI which has become recently capable of doing them, this can increase the AI’s autonomy both by making it more powerful and by reducing the amount of humans that were previously in the loop.

As a partial example, the U.S. military is seeking to eventually transition to a state where the human operators of robot weapons are “on the loop” rather than “in the loop” (Wallach & Allen 2013). In other words, whereas a human was previously required to explicitly give the order before a robot was allowed to initiate possibly lethal activity, in the future humans are meant to merely supervise the robot’s actions and interfere if something goes wrong. While this would allow the system to react faster, it would also limit the window that the human operators have for overriding any mistakes that the system makes. For a number of military systems, such as automatic weapons defense systems designed to shoot down incoming missiles and rockets, the extent of human oversight is already limited to accepting or overriding a computer’s plan of actions in a matter of seconds, which may be too little to make a meaningful decision in practice (Human Rights Watch 2012).

Sparrow (2016) reviews three major reasons which incentivize major governments to move toward autonomous weapon systems and reduce human control:

1. Currently existing remotely piloted military “drones,” such as the U.S. Predator and Reaper, require a high amount of communications bandwidth. This limits the amount of drones that can be fielded at once, and makes them dependent on communications satellites which not every nation has, and which can be jammed or targeted by enemies. A need to be in constant communication with remote operators also makes it impossible to create drone submarines, which need to maintain a communications blackout before and during combat. Making the drones autonomous and capable of acting without human supervision would avoid all of these problems.

2. Particularly in air-to-air combat, victory may depend on making very quick decisions. Current air combat is already pushing against the limits of what the human nervous system can handle: further progress may be dependent on removing humans from the loop entirely.

3. Much of the routine operation of drones is very monotonous and boring, which is a major contributor to accidents. The training expenses, salaries, and other benefits of the drone operators are also major expenses for the militaries employing them.

Sparrow’s arguments are specific to the military domain, but they demonstrate the argument that “any broad domain involving high stakes, adversarial decision making, and a need to act rapidly is likely to become increasingly dominated by autonomous systems” (Sotala & Yampolskiy 2015, p. 18). Similar arguments can be made in the business domain: eliminating human employees to reduce costs from mistakes and salaries is something that companies would also be incentivized to do, and making a profit in the field of high-frequency trading already depends on outperforming other traders by fractions of a second. While the currently existing AI systems are not powerful enough to cause global catastrophe, incentives such as these might drive an upgrading of their capabilities that eventually brought them to that point.

In the absence of sufficient regulation, there could be a “race to the bottom of human control” where state or business actors competed to reduce human control and increased the autonomy of their AI systems to obtain an edge over their competitors (see also Armstrong et al. 2016 for a simplified “race to the precipice” scenario). This would be analogous to the “race to the bottom” in current politics, where government actors compete to deregulate or to lower taxes in order to retain or attract businesses.

Suppose that you have a powerful government or corporate actor which has been spending a long time upgrading its AI systems to be increasingly powerful, and achieved better and better gains that way. Now someone shows up and says that they shouldn't make [some set of additional upgrades], because that would push it to the level of a general intelligence, and having autonomous AGIs is bad. I would expect them to do everything in power to argue that no, actually this is still narrow AI, doing these upgrades and keeping the system in control of their operations are fine - especially if they know that failing to do so is likely to confer an advantage to one of their competitors.

The problem is related to one discussed by Goertzel & Pitt (2012): it seems unlikely that governments would ban narrow AI or restrict its development, but there's no clear dividing line between narrow AI and AGI, meaning that if you don't restrict narrow AI then you can't restrict AGI either.

To make the point more directly, the prospect of any modern government seeking to put a damper on current real-world narrow-AI technology seems remote and absurd. It’s hard to imagine the US government forcing a roll-back from modern search engines like Google and Bing to more simplistic search engines like 1997 AltaVista, on the basis that the former embody natural language processing technology that represents a step along the path to powerful AGI.
Wall Street firms (that currently have powerful economic influence on the US government) will not wish to give up their AI-based trading systems, at least not while their counterparts in other countries are using such systems to compete with them on the international currency futures market. Assuming the government did somehow ban AI-based trading systems, how would this be enforced? Would a programmer at a hedge fund be stopped from inserting some more-effective machine learning code in place of the government-sanctioned linear regression code? The US military will not give up their AI-based planning and scheduling systems, as otherwise they would be unable to utilize their military resources effectively. The idea of the government placing an IQ limit on the AI characters in video games, out of fear that these characters might one day become too smart, also seems absurd. Even if the government did so, hackers worldwide would still be drawn to release “mods” for their own smart AIs inserted illicitly into games; and one might see a subculture of pirate games with illegally smart AI.
“Okay, but all these examples are narrow AI, not AGI!” you may argue. “Banning AI that occurs embedded inside practical products is one thing; banning autonomous AGI systems with their own motivations and self-autonomy and the ability to take over the world and kill all humans is quite another!” Note though that the professional AI community does not yet draw a clear border between narrow AI and AGI. While we do believe there is a clear qualitative conceptual distinction, we would find it hard to embody this distinction in a rigorous test for distinguishing narrow AI systems from “proto-AGI systems” representing dramatic partial progress toward human-level AGI. At precisely what level of intelligence would you propose to ban a conversational natural language search interface, an automated call center chatbot, or a house-cleaning robot? How would you distinguish rigorously, across all areas of application, a competent non-threatening narrow-AI system from something with sufficient general intelligence to count as part of the path to dangerous AGI?
A recent workshop of a dozen AGI experts, oriented largely toward originating such tests, failed to come to any definitive conclusions (Adams et al. 2010), recommending instead that a looser mode of evaluation be adopted, involving qualitative synthesis of multiple rigorous evaluations obtained in multiple distinct scenarios. A previous workshop with a similar theme, funded by the US Naval Research Office, came to even less distinct conclusions (Laird et al. 2009). The OpenCog system is explicitly focused on AGI rather than narrow AI, but its various learning modules are also applicable as narrow AI systems, and some of them have largely been developed in this context. In short, there’s no rule for distinguishing narrow AI work from proto-AGI work that is sufficiently clear to be enshrined in government policy, and the banning of narrow AI work seems infeasible as the latter is economically and humanistically valuable, tightly interwoven with nearly all aspects of the economy, and nearly always non-threatening in nature. Even in the military context, the biggest use of AI is in relatively harmless-sounding contexts such as back-end logistics systems, not in frightening applications like killer robots.
Surveying history, one struggles to find good examples of advanced, developed economies slowing down development of any technology with a nebulous definition, obvious wide-ranging short to medium term economic benefits, and rich penetration into multiple industry sectors, due to reasons of speculative perceived long-term risks. Nuclear power research is an example where government policy has slowed things down, but here the perceived economic benefit is relatively modest, the technology is restricted to one sector, the definition of what’s being banned is very clear, and the risks are immediate rather than speculative. More worryingly, nuclear weapons research and development continued unabated for years, despite the clear threat it posed.

Thank you, those are very interesting references, and very important points! I was arguing that solving a certain coordination problem is even harder than solving a different coordination problem, but I'll agree that this argument is moot if (as you seem to be arguing) it's utterly impossible to solve either!

Since you've clearly thought a lot about this, have you written up anything about very-long-term scenarios where you see things going well? Are you in the camp of "we should make a benevolent dictator AI implementing CEV", or "we can make task-limited-AGI-agents and coordinate to never make long-term-planning-AGI-agents", or something else?

Are you in the camp of "we should make a benevolent dictator AI implementing CEV", or "we can make task-limited-AGI-agents and coordinate to never make long-term-planning-AGI-agents", or something else?

No idea. :-)

My general feeling is that having an opinion on the best course of approach would require knowing what AGI and the state of the world will be like when it is developed, but we currently don't know either.

Lots of historical predictions about coming problems have been rendered completely irrelevant because something totally unexpected happened. And the other way around; it would have been hard for people to predict the issue of computer viruses before electricity had been invented, and harder yet to think about how to prepare for it. That might be a bit of exaggeration - our state of understanding about AGI is probably better than the understanding that pre-electric people would have had of computer viruses - but it still feels impossible to effectively reason about at the moment.

My preferred approach is to just have people pursue many different kinds of basic research on AI safety, understanding human values, etc., while also engaging with near-term AI issues so that they get influence in the kinds of organizations which will eventually make decisions about AI. And then hope that we figure out something once the picture becomes clearer.


It does seem that regulation of AI, should it become necessary, basically has to take the form of regulating access to computer chips. Supercomputers (and server farms) are relatively expensive. You can't make your own in your basement. Production is centralized at a few locations and so it would not be terribly difficult to track who they're sold to. They also use lots of electricity, making it easier to track down people who have acquired lots of them illicitly.

I think it's likely that the computing power required for dangerous AGI will remain at a level well above what most people or non-AI businesses will need for their normal activities, at least up until transformative AI has become widespread. So putting strict limits on chip access would allow goverments to severely cripple AI research, without rolling back the narrow-AI tech we've already developed and without looking over every programmer's shoulder to make sure they don't code up a neural net.

(A plan like this could also backfire by creating a large hardware overhang and contributing to a fast takeoff.)

What does it take for something to qualify as agent AI?

Consider something like Siri. Suppose you could not only ask for information ("What will the weather be like today?"), but you could also ask for action ("Call 911/the hospital"). Does this cross the line from "Oracle" to "Agent"?

Maybe there are other definitions, but the way I'm using the term, what you described would definitely be an agent. An oracle probably wouldn't have an internet connection at all, i.e. it would be "boxed". (The box is just a second layer of protection ... The first layer of protection is that a properly-designed safe oracle, even if it had an internet connection, would choose not to use it.)