[ Question ]

Will OpenAI's work unintentionally increase existential risks related to AI?

by Adam Shimi1 min read11th Aug 202038 comments



[The original question was "Is OpenAI increasing the existential risks related to AI?" I changed it to the current one following a discussion with Rohin in the comments. It clarifies that my question asks about the consequences of OpenAI's work will assuming positive and aligned intentions.]

This is a question I've been asked recently by friends interested in AI Safety and EA. Usually this question comes from discussions around GPT-3 and the tendency of OpenAI to invest a lot in capabilities research.

[Following this answer by Vaniver, I propose for a baseline/counterfactual the world where OpenAI doesn't exists but the researchers there still do.]

Yet I haven't seen it discussed here. Is it a debate we failed to have, or has there already been some discussion around it? I found a post from 3 years ago, but I think the situation probably changed in the meantime.

A couple of arguments for and against to prompt your thinking:

  • OpenAI is increasing the existential risks related to AI because:
    • They are doing far more capability research than safety research;
    • They are pushing the state of the art of capability research;
    • Their results will motivate many people to go work on AI capabilities, whether out of wonder or out of fear of unemployment.
  • OpenAI is not increasing the existential risks related to AI because:
    • They have a top-notch safety team;
    • They restrict the access to their models, by either not releasing them outright (GPT-2) or bottlenecking access through their API (GPT-3);
    • Their results are showing the potential dangers of AI, and pushing many people to go work on AI safety.Is OpenAI increasing the existential risks related to AI?
New Answer
Ask Related Question
New Comment

4 Answers

[Speaking solely for myself in this comment; I know some people at OpenAI, but don't have much in the way of special info. I also previously worked at MIRI, but am not currently.]

I think "increasing" requires some baseline, and I don't think it's obvious what baseline to pick here.

For example, consider instead the question "is MIRI decreasing the existential risks related to AI?". Well, are we comparing to the world where everyone currently employed at MIRI vanishes? Or are we comparing to the world where MIRI as an organization implodes, but the employees are still around, and find jobs somewhere else? Or are we comparing to the world where MIRI as an organization gets absorbed by some other entity? Or are we comparing to the world where MIRI still exists, the same employees still work there, but the mission is somehow changed to be the null mission?

Or perhaps we're interested in the effects on the margins--if MIRI had more dollars to spend, or less dollars, how would the existential risks change? Even the answers to those last two questions could easily be quite different--perhaps firing any current MIRI employee would make things worse, but there are no additional people that could be hired by MIRI to make things better. [Prove me wrong!]


With that preamble out of the way, I think there are three main obstacles to discussing this in public, a la Benquo's earlier post.

The main one is something like "appeals to consequences." Talking in public has two main functions: coordinating and information-processing, and it's quite difficult to separate the two functions. [See this post and the related posts at the bottom.] Suppose I think OpenAI makes humanity less safe, and I want humanity to be more safe; I might try to figure out which strategy will be most persuasive (while still correcting me if I'm the mistaken one!) and pursue that strategy, instead of employing a strategy that more quickly 'settles the question' at the cost of making it harder to shift OpenAI's beliefs. More generally, the people with the most information will be people closest to OpenAI, which probably makes them more careful about what they will or won't say. There also seem to be significant asymmetries here, as it might be very easy to say "here are three OpenAI researchers I think are making existential risk lower" but very difficult to say "here are three OpenAI researchers I think are making existential risk higher." [Setting aside the social costs, there's their personal safety to consider.]

The second one is something like "prediction is hard." One of my favorite math stories is the history of the Markov chain; in the version I heard, Markov's rival said a thing, Markov thought to himself "that's not true!" and then formalized the counterexample in a way that dramatically improved that field. Supposing Benquo's story of how OpenAI came about is true, and OpenAI will succeed at making beneficial AI, and (counterfactually) DeepMind wouldn't have succeeded. In this hypothetical world, then it would be the case that while the direct effect of DeepMind on existential AI risk would have been negative, the indirect effect would be positive (as otherwise OpenAI, which succeeded, wouldn't have existed). While we often think we have a good sense of the direct effect of things, in complicated systems it becomes very non-obvious what the total effects are.

The third one is something like "heterogeneity." Rather than passing a judgment on the org as a whole, it would make more sense to make my judgments more narrow; "widespread access to AI seems like it makes things worse instead of better," for example, which OpenAI seems to already have shifted their views on, instead focusing on widespread benefits instead of widespread access.


With those obstacles out of the way, here's some limited thoughts:

I think OpenAI has changed for the better in several important ways over time; for example, the 'Open' part of the name is not really appropriate anymore, but this seems good instead of bad on my models of how to avoid existential risks from AI. I think their fraction of technical staff devoted to reasoning about and mitigating risks is higher than DeepMind's, although lower than MIRI's (tho MIRI's fraction is a very high bar); I don't have a good sense whether that fraction is high enough.

I think the main effects of OpenAI are the impacts they have on the people they hire (and the impacts they don't have on the people they don't hire). There are three main effects to consider here: resources, direction-shifting, and osmosis.

On resources, imagine that there's Dr. Light, whose research interests point in a positive direction, and Dr. Wily, whose research interests point in a negative direction, and the more money you give to Dr. Light the better things get, and the more money you give to Dr. Wily, the worse things get. [But actually what we care about is counterfactuals; if you don't give Dr. Wily access to any of your compute, he might go elsewhere and get similar amounts of compute, or possibly even more.]

On direction-shifting, imagine someone has a good idea for how to make machine learning better, and they don't really care what the underlying problem is. You might be able to dramatically change their impact by pointing them at cancer-detection instead of missile guidance, for example. Similarly, they might have a default preference for releasing models, but not actually care much if management says the release should be delayed.

On osmosis, imagine there are lots of machine learning researchers who are mostly focused on technical problems, and mostly get their 'political' opinions for social reasons instead of philosophical reasons. Then the main determinant of whether they think that, say, the benefits of AI should be dispersed or concentrated might be whether they hang out at lunch with people who think the former or the latter.

I don't have a great sense of how those factors aggregate into an overall sense of "OpenAI: increasing or decreasing risks?", but I think people who take safety seriously should consider working at OpenAI, especially on teams clearly related to decreasing existential risks. [I think people who don't take safety seriously should consider taking safety seriously.]

I think it's fairly self-evident that you should have exceedingly high standards for projects intending to build AGI (OpenAI, DeepMind, others). It's really hard to reduce existential risk from AI, and I think much thought around this has been naive and misguided. 

(Two examples of this outside of OpenAI include: senior AI researchers talking about military use of AI instead of misalignment, and senior AI researchers saying responding to the problems of specification gaming by saying "objectives can be changed quickly when issues surface" and "existential threats to humanity have to be explicitly designed as such".)

An obvious reason to think OpenAI's impact will be net negative is that they seem to be trying to reach AGI as fast as possible, and trying a route different from DeepMind and other competitors, so are in some world shortening the timeline until AI. (I'm aware that there are arguments about why a shorter timeline is better, but I'm not sold on them right now.)

There are also more detailed conversations, about alignment, what the core of the problem actually is, and other strategic questions. I expect (and take from occasional things I hear) I have substantial disagreements with OpenAI decision-makers, which I think alone is sufficient reason for me to feel doomy about humanity's prospects.

That said, I'm quite impressed with their actions around release practises and also their work in becoming a profit-capped entity. I felt like they were a live player with these acts and were clearly acting against their short-term self-interest in favour of humanity's broader good, with some relatively sane models around these specific aspects of what's important. Those were both substantial updates for me, and make me feel pretty cooperative with them.

And of course I'm very happy indeed about a bunch of the safety work they do and support. The org give lots of support and engineers to people like Paul Christiano, Chris Olah, etc that I think is better than those people probably would get counterfactually, and I'm very grateful that the organisation provides this.

Overall I don't feel my opinion is very robust, and could easily change. Here's some example of things that I think could substantially change my opinion:

  • How senior decision-making happens at OpenAI
  • What technical models of AGI senior researchers at OpenAI have
  • Broader trends that would have happened to the field of AI (and the field of AI alignment) in the counterfactual world where they were not founded

See all the discussion under the OpenAI tag. Don't forget SSC's post on it either.

I mostly think we had a good discussion about it when it launched (primarily due to Ben Hoffman and Scott Alexander deliberately creating the discussion).

OpenAI's work speeds up progress, but in a way that's likely smooth progress later on. If you spend as much compute as possible now, you reduce potential surprises in the future.