[ Question ]

Will OpenAI's work unintentionally increase existential risks related to AI?

by Adam Shimi1 min read11th Aug 202041 comments

13

OpenAIGPTAI RiskAI
Frontpage

[The original question was "Is OpenAI increasing the existential risks related to AI?" I changed it to the current one following a discussion with Rohin in the comments. It clarifies that my question asks about the consequences of OpenAI's work will assuming positive and aligned intentions.]

This is a question I've been asked recently by friends interested in AI Safety and EA. Usually this question comes from discussions around GPT-3 and the tendency of OpenAI to invest a lot in capabilities research.

[Following this answer by Vaniver, I propose for a baseline/counterfactual the world where OpenAI doesn't exists but the researchers there still do.]

Yet I haven't seen it discussed here. Is it a debate we failed to have, or has there already been some discussion around it? I found a post from 3 years ago, but I think the situation probably changed in the meantime.

A couple of arguments for and against to prompt your thinking:

  • OpenAI is increasing the existential risks related to AI because:
    • They are doing far more capability research than safety research;
    • They are pushing the state of the art of capability research;
    • Their results will motivate many people to go work on AI capabilities, whether out of wonder or out of fear of unemployment.
  • OpenAI is not increasing the existential risks related to AI because:
    • They have a top-notch safety team;
    • They restrict the access to their models, by either not releasing them outright (GPT-2) or bottlenecking access through their API (GPT-3);
    • Their results are showing the potential dangers of AI, and pushing many people to go work on AI safety.Is OpenAI increasing the existential risks related to AI?
New Answer
Ask Related Question
New Comment

4 Answers

[Speaking solely for myself in this comment; I know some people at OpenAI, but don't have much in the way of special info. I also previously worked at MIRI, but am not currently.]

I think "increasing" requires some baseline, and I don't think it's obvious what baseline to pick here.

For example, consider instead the question "is MIRI decreasing the existential risks related to AI?". Well, are we comparing to the world where everyone currently employed at MIRI vanishes? Or are we comparing to the world where MIRI as an organization implodes, but the employees are still around, and find jobs somewhere else? Or are we comparing to the world where MIRI as an organization gets absorbed by some other entity? Or are we comparing to the world where MIRI still exists, the same employees still work there, but the mission is somehow changed to be the null mission?

Or perhaps we're interested in the effects on the margins--if MIRI had more dollars to spend, or less dollars, how would the existential risks change? Even the answers to those last two questions could easily be quite different--perhaps firing any current MIRI employee would make things worse, but there are no additional people that could be hired by MIRI to make things better. [Prove me wrong!]

---

With that preamble out of the way, I think there are three main obstacles to discussing this in public, a la Benquo's earlier post.

The main one is something like "appeals to consequences." Talking in public has two main functions: coordinating and information-processing, and it's quite difficult to separate the two functions. [See this post and the related posts at the bottom.] Suppose I think OpenAI makes humanity less safe, and I want humanity to be more safe; I might try to figure out which strategy will be most persuasive (while still correcting me if I'm the mistaken one!) and pursue that strategy, instead of employing a strategy that more quickly 'settles the question' at the cost of making it harder to shift OpenAI's beliefs. More generally, the people with the most information will be people closest to OpenAI, which probably makes them more careful about what they will or won't say. There also seem to be significant asymmetries here, as it might be very easy to say "here are three OpenAI researchers I think are making existential risk lower" but very difficult to say "here are three OpenAI researchers I think are making existential risk higher." [Setting aside the social costs, there's their personal safety to consider.]

The second one is something like "prediction is hard." One of my favorite math stories is the history of the Markov chain; in the version I heard, Markov's rival said a thing, Markov thought to himself "that's not true!" and then formalized the counterexample in a way that dramatically improved that field. Supposing Benquo's story of how OpenAI came about is true, and OpenAI will succeed at making beneficial AI, and (counterfactually) DeepMind wouldn't have succeeded. In this hypothetical world, then it would be the case that while the direct effect of DeepMind on existential AI risk would have been negative, the indirect effect would be positive (as otherwise OpenAI, which succeeded, wouldn't have existed). While we often think we have a good sense of the direct effect of things, in complicated systems it becomes very non-obvious what the total effects are.

The third one is something like "heterogeneity." Rather than passing a judgment on the org as a whole, it would make more sense to make my judgments more narrow; "widespread access to AI seems like it makes things worse instead of better," for example, which OpenAI seems to already have shifted their views on, instead focusing on widespread benefits instead of widespread access.

---

With those obstacles out of the way, here's some limited thoughts:

I think OpenAI has changed for the better in several important ways over time; for example, the 'Open' part of the name is not really appropriate anymore, but this seems good instead of bad on my models of how to avoid existential risks from AI. I think their fraction of technical staff devoted to reasoning about and mitigating risks is higher than DeepMind's, although lower than MIRI's (tho MIRI's fraction is a very high bar); I don't have a good sense whether that fraction is high enough.

I think the main effects of OpenAI are the impacts they have on the people they hire (and the impacts they don't have on the people they don't hire). There are three main effects to consider here: resources, direction-shifting, and osmosis.

On resources, imagine that there's Dr. Light, whose research interests point in a positive direction, and Dr. Wily, whose research interests point in a negative direction, and the more money you give to Dr. Light the better things get, and the more money you give to Dr. Wily, the worse things get. [But actually what we care about is counterfactuals; if you don't give Dr. Wily access to any of your compute, he might go elsewhere and get similar amounts of compute, or possibly even more.]

On direction-shifting, imagine someone has a good idea for how to make machine learning better, and they don't really care what the underlying problem is. You might be able to dramatically change their impact by pointing them at cancer-detection instead of missile guidance, for example. Similarly, they might have a default preference for releasing models, but not actually care much if management says the release should be delayed.

On osmosis, imagine there are lots of machine learning researchers who are mostly focused on technical problems, and mostly get their 'political' opinions for social reasons instead of philosophical reasons. Then the main determinant of whether they think that, say, the benefits of AI should be dispersed or concentrated might be whether they hang out at lunch with people who think the former or the latter.

I don't have a great sense of how those factors aggregate into an overall sense of "OpenAI: increasing or decreasing risks?", but I think people who take safety seriously should consider working at OpenAI, especially on teams clearly related to decreasing existential risks. [I think people who don't take safety seriously should consider taking safety seriously.]

I would reemphasize that the "does OpenAI increase risks" is a counterfactual question. That means we need to be clearer about what we are asking as a matter of predicting what the counterfactuals are, and consider strategy options for going forward. This is a major set of questions, and increasing or decreasing risks as a single metric isn't enough to capture much of interest.

For a taste of what we'd want to consider, what about the following:

Are we asking OpenAI to pick a different, "safer" strategy?

Perhaps they should focu... (read more)

3Matthew "Vaniver" Graves10moAlso apparently Megaman is less popular than I thought so I added links to the names.
1Davidmanheim10moOh. Right. I should have gotten the reference, but wasn't thinking about it.
1Raymond Arnold10moFwiw I recently listened to the excellent song 'The Good Doctor [https://www.youtube.com/watch?v=HP2NePWJ2pQ]' which has me quite delighted to get random megaman references.
1Adam Shimi10moJust so you know, I got the reference. ;)

Thanks a lot for this great answer!

First, I should have written it, but my baseline (or my counterfactual) is a world where OpenAI doesn't exists but the people working there still exists. This might be an improvement if you think that pushing the scaling hypothesis is dangerous and that most of the safety team would find money to keep working, or an issue if you think someone else, probably less aligned, would have pushed the scaling hypothesis, and that the structure given by OpenAI to its safety team is really special and important.

As for your obst... (read more)

1Matthew "Vaniver" Graves10moBut part of the problem here is that the question "what's the impact of our stance on OpenAI on existential risks?" is potentially very different from "is OpenAI's current direction increasing or decreasing existential risks?", and as people outside of OpenAI have much more control over their stance than they do over OpenAI's current direction, the first question is much more actionable. And so we run into the standard question substitution [https://www.lesswrong.com/posts/LHtMNz7ua8zu4rSZr/the-substitution-principle] problems, where we might be pretending to talk about a probabilistic assessment of an org's impact while actually targeting the question of "how do I think people should relate to OpenAI?". [That said, I see the desire to have clear discussion of the current direction, and that's why I wrote as much as I did, but I think it has prerequisites that aren't quite achieved yet.]

Post OpenAI exodus update: does the exit of Dario Amodei, Chris Olah, Jack Clarke and potentially others from OpenAI make you change your opinion?

I think it's fairly self-evident that you should have exceedingly high standards for projects intending to build AGI (OpenAI, DeepMind, others). It's really hard to reduce existential risk from AI, and I think much thought around this has been naive and misguided. 

(Two examples of this outside of OpenAI include: senior AI researchers talking about military use of AI instead of misalignment, and senior AI researchers saying responding to the problems of specification gaming by saying "objectives can be changed quickly when issues surface" and "existential threats to humanity have to be explicitly designed as such".)

An obvious reason to think OpenAI's impact will be net negative is that they seem to be trying to reach AGI as fast as possible, and trying a route different from DeepMind and other competitors, so are in some world shortening the timeline until AI. (I'm aware that there are arguments about why a shorter timeline is better, but I'm not sold on them right now.)

There are also more detailed conversations, about alignment, what the core of the problem actually is, and other strategic questions. I expect (and take from occasional things I hear) I have substantial disagreements with OpenAI decision-makers, which I think alone is sufficient reason for me to feel doomy about humanity's prospects.

That said, I'm quite impressed with their actions around release practises and also their work in becoming a profit-capped entity. I felt like they were a live player with these acts and were clearly acting against their short-term self-interest in favour of humanity's broader good, with some relatively sane models around these specific aspects of what's important. Those were both substantial updates for me, and make me feel pretty cooperative with them.

And of course I'm very happy indeed about a bunch of the safety work they do and support. The org give lots of support and engineers to people like Paul Christiano, Chris Olah, etc that I think is better than those people probably would get counterfactually, and I'm very grateful that the organisation provides this.

Overall I don't feel my opinion is very robust, and could easily change. Here's some example of things that I think could substantially change my opinion:

  • How senior decision-making happens at OpenAI
  • What technical models of AGI senior researchers at OpenAI have
  • Broader trends that would have happened to the field of AI (and the field of AI alignment) in the counterfactual world where they were not founded

Thanks for your answer! Trying to make your examples of what might change your opinion substantially more concrete, I got these:

  • Does senior decision-making at OpenAI always consider safety issues before greenlighting new capability research?
  • Do senior researchers at OpenAI believe that their current research directly leads to AGI in the short term?
  • Would the Scaling Hypothesis (and thus GPT-N) have been vindicated as soon in a world without OpenAI?

Do you agree with these? Do you have other ideas of concrete questions?

The first one feels a bit too optimistic. It’s something more like: Are they able to be direct in their disagreement with one another? What level of internal politicking is there? How much ability do some of the leadership have to make unilateral decisions? Etc.

The second one is the one more about alignment, takeoff dynamics, and timelines. All the details, like the likelihood of Mesa optimisers. What are their thoughts on this, and how much do they think about it?

For the third, that one’s good. Also things about how differently things would’ve gone at DeepMind, and also how good/bad the world would be if Musk hadn’t shifted The Overton window so much (which I think is counterfactually linked up with OpenAI existing, you get both or neither).

Post OpenAI exodus update: does the exit of Dario Amodei, Chris Olah, Jack Clarke and potentially others from OpenAI make you change your opinion?

See all the discussion under the OpenAI tag. Don't forget SSC's post on it either.

I mostly think we had a good discussion about it when it launched (primarily due to Ben Hoffman and Scott Alexander deliberately creating the discussion).

Do you think you (or someone else) could summarize this discussion here? I have to admit that the ideas being spread out between multiple posts doesn't help.

1Ben Pace10moI don’t plan to. I’d strong upvote if someone else did a nice job of summarising the discussion, perhaps inspired by how I distilled the discusssion around what failure looks like [https://www.lesswrong.com/posts/6jkGf5WEKMpMFXZp2/what-failure-looks-like-distilling-the-discussion] . (To be clear I think my distillation of the comment section was much better and more useful than the distillation of the post itself.)

OpenAI's work speeds up progress, but in a way that's likely smooth progress later on. If you spend as much compute as possible now, you reduce potential surprises in the future.

But what if they reach AGI during their speed up? The smoothing at a later time assumes that we'll end up with diminishing returns before AGI, which is not what happens for the moment.

Post OpenAI exodus update: does the exit of Dario Amodei, Chris Olah, Jack Clarke and potentially others from OpenAI make you change your opinion?

22 comments, sorted by Highlighting new comments since Today at 9:28 PM

As I read this question, it translated as:

"Is everyone at OpenAI a moral monster?"

I would much prefer this question if it instead translated as:

"Are OpenAI's efforts counterproductive?"

The current version of the question seems needlessly controversial / aggressive. (This is similar to Alexei's point, except I haven't downvoted because I think the question could easily be rephrased to be fine, even if it specifically names OpenAI.)

FWIW, I thought the original question text was slightly better, since I didn't read it as aggressive, and it didn't needlessly explicitly assume that everyone at OpenAI is avoiding increasing existential risk. Furthermore, it seems clear to me that an organisation can be increasing existential risk without everybody at that organisation being a moral monster, since most organisations are heterogeneous.

In general, I think one should be able to ask questions of the form "is actor X causing harm Y" on LessWrong, and furthermore that people should not thereby assume that the questioner thinks that actor X is evil. I also think that some people are moral monsters and/or evil, and the way to figure out whether or not that's true is to ask questions of this form.

In general, I think one should be able to ask questions of the form "is actor X causing harm Y" on LessWrong, and furthermore that people should not thereby assume that the questioner thinks that actor X is evil.

I can believe that should be the case (not sure). I do not think it is actually the case. Is this the battle you choose to fight?

I also think that some people are moral monsters and/or evil, and the way to figure out whether or not that's true is to ask questions of this form.

If you do in fact want to know this answer, I feel more okay about asking the question (though I have bigger disagreements upstream). I don't think OP was particularly interested in this answer.

I see what you mean. Although my question is definitely pointed at OpenAI, I don't want to accuse them of anything. One thing I wanted to write in the question but that I forgot was that the question asks about the consequences of OpenAI's work, not the intentions. So there might be negative consequences that were not intentional (or no negative consequences of course).

Is "Are the consequences of OpenAI's work positive or negative for xrisks?" better?

"Will OpenAI's work unintentionally increase existential risks related to AI?"

"Will OpenAI's strategy succeed at reducing existential risks related to AI?"

The point is to build in a presumption of good intentions, unless you explicitly want to challenge that presumption (which I expect you do not want to do).

David's suggestion also seems good to me, though is asking a slightly different question and is a bit wordier.

Done! I used your first proposal, as it is more in line with my original question.

I think this is a worse question now? Like, I expect OpenAI leadership explicitly thinks of themselves as increasing x-risk a bit by choosing to attempt to speed up progress to AGI. 

On net they expect it‘s probably the right call, but they also probably would say “Yes, our actions are intentionally increasing the chances of x-risk in some worlds, but on net we think it’s improving things”. And then, supposing they’re wrong, and those worlds are the actual world, then they’re intentionally increasing x-risk. And now the question tells me to ignore that possibility.

The initial question made no discussion of intention, seemed better to me.

Like, I expect OpenAI leadership explicitly thinks of themselves as increasing x-risk a bit by choosing to attempt to speed up progress to AGI.

Do you think that they think they are increasing x-risk in expectation (where the expectation is according to their beliefs)? I'd find that extremely surprising (unless their reasoning is something like "yes, we raise it from 1 in a trillion to 2 in a trillion, this doesn't matter").

See my reply downthread, responding to where you asked Oli for an example.

Hum, my perspective is that in the example that you describe, OpenAI isn't intentionally increasing the risks, in that they think it improves things over all. My line at "intentionally increasing xrisks" would be to literally decide to act while thinking/knowing that your action are making things worse in general for xrisks, which doesn't sound like your example.

I much prefer Rohin's alternative version of: "Are OpenAI's efforts to reduce existential risk counterproductive?". The current version does feel like it screens off substantial portions of the potential risk.

Example? I legitimately struggle to imagine something covered by "Are OpenAI's efforts to reduce existential risk counterproductive?" but not by "Will OpenAI's work unintentionally increase existential risks related to AI?"; if anything it seems the latter covers more than the former.

One route would be if some of them thought that existential risks weren't that much worse than major global catastrophes. 

If I think that likely 10% of everyone will die because of the wrong people getting control of the killer AI drones ("slaughterbots"), and it's important that we get to AI as quickly as possible, then we might move it forward as quickly as possible because we want to be in control, at the expense of some kinds of unlikely alignment problems. This person accepts a very small increase in the chance of existential risk via indirect AI issues at the price of a substantial decrease in the chance of 10% of humanity being wiped out via bad direct use of the AI. This would be intentionally be increasing x-risk in expectation, and they would agree.

You might correctly point out that Paul Christiano and Chris Olah don't think like this, but I don't really know who is involved in leadership at OpenAI, perhaps "safe" AI to some of them means "non-military". So this is a case that the new title rules out.

Yeah, that's a good example, thanks.

(I do think it is unlikely.)

I'd focus even more, (per my comment to Vanniver's response,) and ask "What parts of OpenAI are most and least valuable, and how do these relate to their strategy - and what strategy is best?"

Some OpenAI people are on LW. It'd be interesting to hear their thoughts as well.

Two general things which have made me less optimistic about OpenAI are that:

  1. They recently spun-out a capped-profit company, which seems like the end goal is monetizing some of their recent advancements. The page linked in the previous sentence also has some stuff about safety and about how none of their day-to-day work is changing, but it doesn't seem that encouraging.

  2. They've recently partnered up with Microsoft, presumably for product integration. This seems like it positions them as less of a neutral entity, especially as Alphabet owns DeepMind.

They recently spun-out a capped-profit company, which seems like the end goal is monetizing some of their recent advancements. The page linked in the previous sentence also has some stuff about safety and about how none of their day-to-day work is changing, but it doesn't seem that encouraging.

I found this moderately encouraging instead of discouraging. So far I think OpenAI is 2 for 2 on managing organizational transitions in ways that seem likely to not compromise safety very much (or even improve safety) while expanding their access to resources; if you think the story of building AGI looks more like assembling a coalition that's able to deploy massive resources to solve the problem than a flash of insight in a basement, then the ability to manage those transitions becomes a core part of the overall safety story.

This makes sense to me, given the situation you describe.

That's an interesting point. Why do you think that the new organizational transition is not compromising safety? (I have no formed opinion on this, but it seems that adding economic incentives is dangerous by default)

I agree that adding economic incentives is dangerous by default, but think their safeguards are basically adequate to overcome that incentive pressure. At the time I spent an hour trying to come up with improvements to the structure, and ended up not thinking of anything. Also remember that this sort of change, even if it isn't a direct improvement, can be an indirect improvement by cutting off unpleasant possibilities; for example, before the move to the LP, there was some risk OpenAI would become a regular for-profit, and the LP move dramatically lowered that risk.

I also think for most of the things I'm concerned about, psychological pressure to think the thing isn't dangerous is more important; like, I don't think we're in the cigarette case where it's mostly other people who get cancer while the company profits; I think we're in the case where either the bomb ignites the atmosphere or it doesn't, and even in wartime the evidence was that people would abandon plans that posed a serious chance of destroying humanity.

Note also that economic incentives quite possibly push away from AGI towards providing narrow services (see Drexler's various arguments that AGI isn't economically useful, and so people won't make it by default). If you are more worried about companies that want to build AGIs and then ask it what to do than you are about companies that want to build AIs to accomplish specific tasks, increased short-term profit motive makes OpenAI more likely to move in the second direction. [I think this consideration is pretty weak but worth thinking about.]

So if I understand your main point, you argue that OpenAI LP incentivized new investments without endangering the safety, thanks to the capped returns. And that this tradeoff looks like one of the best possible, compared to becoming a for-profit or getting bought by a big for-profit company. Is that right?

I also think for most of the things I'm concerned about, psychological pressure to think the thing isn't dangerous is more important; like, I don't think we're in the cigarette case where it's mostly other people who get cancer while the company profits; I think we're in the case where either the bomb ignites the atmosphere or it doesn't, and even in wartime the evidence was that people would abandon plans that posed a serious chance of destroying humanity.

I agree with you that we're in the second case, but that doesn't necessarily means that there's a fire alarm. And economic incentives might push you to go slightly further, where it looks like everything is still okay, but we reached transformative AI in a terrible way. [I don't think it is actually the case for OpenAI right now, just answering to your point.]

Note also that economic incentives quite possibly push away from AGI towards providing narrow services (see Drexler's various arguments that AGI isn't economically useful, and so people won't make it by default). If you are more worried about companies that want to build AGIs and then ask it what to do than you are about companies that want to build AIs to accomplish specific tasks, increased short-term profit motive makes OpenAI more likely to move in the second direction

Good point, I need to think more about that. A counterargument that springs to mind is that AGI research might push forward other kinds of AI, and thus bring transformative AI sooner even if it isn't an AGI.

thanks to the capped returns

Out of the various mechanisms, I think the capped returns are relatively low ranking; probably the top on my list is the nonprofit board having control over decision-making (and implicitly the nonprofit board's membership not being determined by investors, as would happen in a normal company).