What I talk about when I talk about AI x-risk: 3 core claims I want machine learning researchers to address.

by David Krueger2 min read2nd Dec 20194 comments


Existential Risk

Recently, as PCSOCMLx, I (co-)hosted a session with the goal of explaining, debating, and discussing what I view as "the case for AI x-risk". Specifically, my goal was/is to make the case for the "out-of-control AI killing everyone" type of AI x-risk, since many or most ML researchers already accept that there are significant risks from misuse of AI that should be addressed. I'm sharing my outline, since it might be useful to others, and in order to get feedback on it. Please tell me what you think it does right/wrong!

EDIT: I noticed I (and others I've spoken to about this) haven't been clear enough about distinguishing CLAIMS and ARGUMENTS. I'm hoping to make this clearer in the future.

Some background/context

I estimate I've spent ~100-400 hours discussing AI x-risk with machine learning researchers during the course of my MSc and PhD. My current impression is that rejection of AI x-risk by ML researchers is mostly due to a combination of:

  • Misunderstanding of what I view as the key claims (e.g. believing "the case for x-risk hinges on short-timelines and/or fast take-off").
  • Ignorance of the basis for AI x-risk arguments (e.g. no familiarity with the argument from instrumental convergence).
  • Different philosophical groundings (e.g. not feeling able/compelled to try and reason using probabilities and expected value; not valuing future lives very much; an unexamined apparent belief that current "real problems" should always take precedence of future "hypothetical concerns" resulting in "whataboutism").

I suspect that ignorance about the level of support for AI x-risk concerns among other researchers also plays a large role, but it's less clear... I think people don't like to be seen to be basing their opinions on other researchers'. Underlying all of this seems to be a mental move of "outright rejection" based on AI x-risk failing many powerful and useful heuristics. AI x-risk is thus commonly viewed as a Pascal's mugging: "plausible" but not plausible enough to compel any consideration or action. A common attitude is that AI take-over has a "0+epsilon" chance of occurring.I'm hoping that being more clear and modest in the claims I/we aim to establish can help move discussions with researchers forward. I've recently been leaning heavily on the unpredictability of the future and making ~0 mention of my own estimates about the likelihood of AI x-risk, with good results.

The 3 core claims:

1) The development of advanced AI increases the risk of human extinction (by a non-trivial amount, e.g. 1%), for the following reasons:

  • Goodhart's law
  • Instrumental goals
  • Safety-performance trade-offs (e.g. capability control vs. motivation control)

2) To mitigating this existential risk (x-risk) we need progress in 3 areas:

  • Knowing how to build safe systems ("control problem")
  • Knowing that we know how to build safe systems ("justified confidence")
  • Preventing people from building unsafe systems ("global coordination")

3) Mitigating AI x-risk seems like an ethical priority because it is:

  • high impact
  • neglected
  • challenging but tractable


Unfortunately, only 3 people showed up to our session (despite something like 30 expressing interest). So I didn't learn to much about the effectiveness of this presentation. My 2 main take-aways are:

  • Somewhat unsurprisingly, claim 1 had the least support. While I find this claim and the supporting arguments quite compelling and intuitive, there seem to be inferential gaps that I struggle to address quickly/easily. A key sticking point seems to be the lack of a highly plausible concrete scenario. I think it might also require more discussion of epistemics in order to move people from "I understand the basis for concern" to "I believe there is a non-trivial chance of an out-of-control AI killing everyone".
  • The phrase "ethical priority" raises alarm bells for people, and should be replaced of clarified. Once I clarified that I meant it in the same way as "combating climate change is an ethical priority", people seemed to accept it.

Some more details on the event:

The title for our session was: The case for AI as an existential risk, and a call for discussion and debate. Our blurb was: A growing number of researchers are concerned about scenarios in which machines, instead of people, control the future. What is the basis for these concerns, and are they well-founded? I believe they are, and we have an obligation as a community to address them. I can lead with a few minutes summarizing the case for that view. We can then discuss what nuances, objections, and take-aways.I also started with some basic background to make sure people understood the topic:

  • X-risk = risk of human extinction
  • The 3 kinds of risk (misuse, accident, structural)
  • The specific risk scenario I'm concerned with: out of control AI