Subtitle: A partial defense of high-confidence AGI doom predictions.
Consider these two kinds of accident scenarios:
See also: conjuctive vs disjunctive risk scenarios.
Default-success scenarios include most engineering tasks that we have lots of experience with and know how to do well: building bridges, building skyscrapers, etc. Default-failure scenarios, as far as I can tell, come in two kinds: scenarios in which we’re trying to do something for the first time (rocket test launches, prototypes, new technologies) and scenarios in which there is a competent adversary that is trying to break the system, as in computer security.
In the following, I use P(doom) to refer to the probability of an AGI takeover and / or human extinction due to the development of AGI.
I often encounter the following argument against predictions of AGI catastrophes:
Alice: We seem to be on track to build an AGI smarter than humans. We don’t know how to solve the technical problem of building an AGI we can control, or the political problem of convincing people to not build AGI. Every plausible scenario I’ve ever thought or heard of leads to AGI takeover. In my estimate, P(doom) is [high number].
Bob: I disagree. It’s overconfident to estimate high P(doom). Humans are usually bad at predicting the future, especially when it comes to novel technologies like AGI. When you account for how uncertain your predictions are, your estimate should be at most [low number].”
I'm being vague about the numbers because I've seen Bob's argument made in many different situations. In one recent conversation I witnessed, the Bob-Alice split was P(doom) 0.5% vs. ~10%, and in another discussion it was 10% vs. 90%.
My main claim is that Alice and Bob don’t actually disagree about how uncertain or hard to predict the future is―instead, they disagree about to what degree AGI risk is default-success vs. default-failure. If AGI risk is (mostly) default-failure, then uncertainty is a reason for pessimism rather than optimism, and Alice is right to predict failure.
In this sense I think Bob is missing the point. Bob claims that Alice is not sufficiently uncertain about her AI predictions, or has not integrated her uncertainty into her estimate well enough. This is not necessarily true; it may just be that Alice’s uncertainty about her reasoning doesn't make her much more optimistic.
Instead of trying to refute Alice from general principles, I think Bob should instead point to concrete reasons for optimism (for example, Bob could say “for reasons A, B, and C it is likely that we can coordinate on not building AGI for the next 40 years and solve alignment in the meantime”).
Many people are skeptical of the ‘default-failure’ frame, so I'll give a bit more color here by listing some reasons why I think Bob's argument is wrong / unproductive. I won’t go into detail about why AGI risk specifically might be a default-failure scenario; you can find a summary of those arguments in Nate Soares’ post on why AGI ruin is likely.
This part is me hedging my claims. Feel free to skip if that seems like a boring thing to read.
I don’t personally estimate P(doom) above 90%.
I’m also not saying there are no reasons to be optimistic. I’m claiming that reasons for optimism should usually be concrete arguments about possible ways to avoid doom. For example, Paul Christiano argues for a somewhat lower than 90% P(doom) here, and I think the general shape of his argument makes sense, in contrast to Bob’s above.
I do think there is a correct version of the argument that, if your model says P(outcome) = 0.99, model uncertainty will generally be a reason to update downwards. I think people already take that into account when stating high P(doom) estimates. Here’s a sketch of a plausible reasoning (summarized and not my numbers, but I do have similar reasoning, and I don’t think the numbers are crazy):
I don’t mean to say that the reasoning here is the only reasonable version out there. It depends a lot on how likely you think various definitely-useful surprises are, like long timelines to AGI and slow progress after proto-AGI. But I do think it is wrong to call high P(doom) estimates overconfident without any further more detailed criticism.
Finally, I haven’t given an explicit argument for AGI risk; there’s a lot of that elsewhere.
Note how AGI somehow manages to satisfy both of these criteria at once.