(Cross-posted from personal blog. Summarized in Alignment Newsletter #104. Thanks to Janos Kramar for his helpful feedback on this post.)
Epistemic status: fairly speculative, would appreciate feedback
As the covid-19 pandemic unfolds, we can draw lessons from it for managing future global risks, such as other pandemics, climate change, and risks from advanced AI. In this post, I will focus on possible implications for AI risk. For a broader treatment of this question, I recommend FLI's covid-19 page that includes expert interviews on the implications of the pandemic for other types of risks.
A key element in AI risk scenarios is the speed of takeoff - whether advanced AI is developed gradually or suddenly. Paul Christiano's post on takeoff speeds defines slow takeoff in terms of the economic impact of AI as follows: "There will be a complete 4 year interval in which world output doubles, before the first 1 year interval in which world output doubles." It argues that slow AI takeoff is more likely than fast takeoff, but is not necessarily easier to manage, since it poses different challenges, such as large-scale coordination. This post expands on this point by examining some parallels between the coronavirus pandemic and a slow takeoff scenario. The upsides of slow takeoff include the ability to learn from experience, act on warning signs, and reach a timely consensus that there is a serious problem. I would argue that the covid-19 pandemic had these properties, but most of the world's institutions did not take advantage of them. This suggests that, unless our institutions improve, we should not expect the slow AI takeoff scenario to have a good default outcome.
- Learning from experience. In the slow takeoff scenario, general AI is expected to appear in a world that has already experienced transformative change from less advanced AI, and institutions will have a chance to learn from problems with these AI systems. An analogy could be made with learning from dealing with less "advanced" epidemics like SARS that were not as successful as covid-19 at spreading across the world. While some useful lessons were learned, they were not successfully generalized to covid-19, which had somewhat different properties than these previous pathogens (such as asymptomatic transmission and higher virulence). Similarly, general AI may have somewhat different properties from less advanced AI that would make mitigation strategies more difficult to generalize.
- Warning signs. In the coronavirus pandemic response, there has been a lot of variance in how successfully governments acted on warning signs. Western countries had at least a month of warning while the epidemic was spreading in China, which they could have used to stock up on PPE and build up testing capacity, but most did not do so. Experts have warned about the likelihood of a coronavirus outbreak for many years, but this did not lead most governments to stock up on medical supplies. This was a failure to take cheap preventative measures in response to advance warnings about a widely recognized risk with tangible consequences, which is not a good sign for the case where the risk is less tangible and well-understood (such as risk from general AI).
- Consensus on the problem. During the covid-19 epidemic, the abundance of warning signs and past experience with previous pandemics created an opportunity for a timely consensus that there is a serious problem. However, it actually took a long time for a broad consensus to emerge - the virus was often dismissed as "overblown" and "just like the flu" as late as March 2020. A timely response to the risk required acting before there was a consensus, thus risking the appearance of overreacting to the problem. I think we can also expect this to happen with advanced AI. Similarly to the discussion of covid-19, there is an unfortunate irony where those who take a dismissive position on advanced AI risks are often seen as cautious, prudent skeptics, while those who advocate early action are portrayed as "panicking" and overreacting. The "moving goalposts" effect, where new advances in AI are dismissed as not real AI, could continue indefinitely as increasingly advanced AI systems are deployed. I would expect the "no fire alarm" hypothesis to hold in the slow takeoff scenario - there may not be a consensus on the importance of general AI until it arrives, so risks from advanced AI would continue to be seen as "overblown" until it is too late to address them.
We can hope that the transformative technological change involved in the slow takeoff scenario will also help create more competent institutions without these weaknesses. We might expect that institutions unable to adapt to the fast pace of change will be replaced by more competent ones. However, we could also see an increasingly chaotic world where institutions fail to adapt without better institutions being formed quickly enough to replace them. Success in the slow takeoff scenario depends on institutional competence and large-scale coordination. Unless more competent institutions are in place by the time general AI arrives, it is not clear to me that slow takeoff would be much safer than fast takeoff.
Thanks for writing this. I've been thinking along similar lines since the pandemic started. Another takeaway for me: Under our current political system, AI risk will become politicized. It will be very easy for unaligned or otherwise dangerous AI to find human "allies" who will help to prevent effective social response. Given this, "more competent institutions" has to include large-scale and highly effective reforms to our democratic political structures, but political dysfunction is such a well-known problem (i.e., not particularly neglected) that if there were easy fixes, they would have been found and applied already.
So whereas you're careful to condition your pessimism on "unless our institutions improve", I'm just pessimistic. (To clarify, I was already pessimistic before COVID-19, so it just provided more details about how coordination/institutions are likely to fail, which I didn't have a clear picture of. I'm curious if COVID-19 was an update for you as far as your overall assessment of AI risk. That wasn't totally clear from the post.)
On a related note, I recall Paul said the risk from failure of AI alignment (I think he said or meant "intent alignment") is 10%; Toby Ord gave a similar number for AI risk in his recent book; 80,000 Hours, based on interviews with multiple AI risk researchers, said "We estimate that the risk of a serious catastrophe caused by machine intelligence within the next 100 years is between 1 and 10%." Until now 1-10% seems to have been the consensus view among the most prominent AI risk researchers. I wonder if that has changed due to recent events.
In September 2017, based on some conversations with MIRI and non-MIRI folks, I wrote:
People may have become more optimistic since then, but most people falling in the 1-10% range would still surprise me a lot. (Even excluding MIRI people, whose current probabilities I don't know but who I think of as skewing pessimistic compared to other orgs.)
I complained to 80K about this back in 2017 too! :) I think 1-10% here was primarily meant to represent the median view of the 80,000 Hours team (or something to that effect), not the median view of AGI safety researchers. (Though obviously 80,000 Hours spent tons of time talking to safety researchers and taking those views into account. I just want to distinguish "this is our median view after talking to experts" from "this is our attempt to summarize experts' median view after talking to experts".)
Thinking for a minute, I guess my unconditional probability of unaligned AI ending civilization (or something similar) is around 75%. It’s my default expected outcome.
That said, this isn’t a number I try to estimate directly very much, and I’m not sure if it would be the same after an hour of thinking about that number. Though I’d be surprised if I ended up giving more than 95% or less than 40%.
Curious where yours is at?
I'm not Wei, but I think my estimate falls within that range as well.
Thanks Wei! I agree that improving institutions is generally very hard. In a slow takeoff scenario, there would be a new path to improving institutions using powerful (but not fully general) AI, but it's unclear how well we could expect that to work given the generally low priors.
The covid response was a minor update for me in terms of AI risk assessment - it was mildly surprising given my existing sense of institutional competence.
Like I wrote before, slow takeoff might be actually worse than fast takeoff. This is because even if the first powerful AIs are aligned, their head start on unaligned AIs will not count as much, and alignment might (and probably will) require overhead that will give the unaligned AIs an advantage. Therefore, success would require that the institutions will either prevent or quickly shut down unaligned AIs for enough time that aligned AIs gain the necessary edge.
I generally endorse the claims made in this post and the overall analogy. Since this post was written, there are a few more examples I can add to the categories for slow takeoff properties.
Learning from experience
Consensus on the problem
While I agree that the COVID response was worse than it could have been, I think there are several important disanalogies between the COVID-19 pandemic and a soft AI takeoff scenario:
1. Many new problems arose during this pandemic for which we did not have historical experience, e.g. in supply chains. (Perhaps we had historical precedent in the Spanish flu, but that was sufficiently long ago that I don’t expect those lessons to generalize, or for us to remember those lessons.) In contrast, I expect that with AI alignment the problems will not change much as the AI systems become more powerful. Certainly the effects of misaligned powerful AI systems will change dramatically and be harder to mitigate, but I expect the underlying causes of misalignment will not change much, and that’s what we need to gain consensus about and find solutions for. EDIT: Note that with COVID we failed even at existing, known problems (see Raemon's comment thread below), so this point doesn't really explain away our failure with COVID.
2. It looks like most institutions took action in about 2 months (mid-Jan to mid-March). While this was (I assume) too slow for COVID, it seems more than sufficient for AI alignment under soft takeoff, where I expect we will have multiple years for the relevant decision-making. However, unlike COVID, there probably won’t be a specific crisis mode that leads to quick(er) decision-making: while there might be warning signs that suggest that action needs to be taken for AI systems, it may not lead to action specifically targeted at AI x-risk.
3. It seems that the model in this post is that we should have learned from past epidemics, and applied it to solve this pandemic. However, in AI alignment, the hope is to learn from failures of narrow AI systems, and use that to prevent failures in more powerful AI systems. I would be pretty pessimistic if AI alignment was banking on noticing failures in powerful AI systems and then quickly mobilizing institutions to mitigate those failures, rather than preventing the failures in the first place. The analogous actions for COVID would be things we could have done before we knew about COVID to mitigate some unknown future pandemic. I don’t know enough about epidemiology to say whether or not there were cost-effective actions that should have been taken ahead of time, but note that any such argument should be evaluated ex ante (i.e. from the perspective where we don’t know that COVID-19 would happen).
Separately, it seems like humanity has in fact significantly mitigated the effects of COVID (something like we reduced deaths to a fraction of what they "could have been"), so if you want to take an extremely outside view approach, you should predict that with AI alignment we'll mitigate the worst effects but there will still be some pretty bad effects, which still argues for not-extinction. (I don't personally buy this reasoning; I mention it as a response to people who say "look at all of our civilization's failures, therefore we should predict failure at AI alignment too".)
Wait... you think there will be fewer novel problems arising during AI (a completely unprecedented phenomenon) than in Covid? Even in my most relaxed, responsible slow-takeoff scenarios, that seems like an extremely surprising claim.
I'm also somewhat confused what facts you think we didn't know about covid that prevented us from preparing – I don't currently have examples of such facts in mind. (The fact that some countries seem to be doing just fine makes it look to me like its totally doable to have solved covid given the information we had at the time, or at least to have responded dramatically more adequately than many countries did).
Relative to our position now, there will be more novel problems from powerful AI systems than for COVID.
Relative to our position e.g. two years before the "point of no return" (perhaps the deployment of the AI system that will eventually lead to extinction), there will be fewer novel problems than for COVID, at least if we are talking about the underlying causes of misalignment.
(The difference is that with AI alignment we're trying to prevent misaligned powerful AI systems from being deployed, whereas with pandemics we don't have the option of preventing "powerful diseases" from arising; we instead have to mitigate their effects.)
I agree that powerful AI systems will lead to more novel problems in their effects on society than COVID did, but that's mostly irrelevant if your goal is to make sure you don't have a superintelligent AI system that is trying to hurt you.
I think it is plausible that we "could have" completely suppressed COVID, and that mostly wouldn't have required facts we didn't know, and the fact that we didn't do that is at least a weak sign of inadequacy.
I think given that we didn't suppress COVID, mitigating its damage probably involved new problems that we didn't have solutions for before. As an example, I would guess that in past epidemics the solution to "we have a mask shortage" would have been "buy masks from <country without the epidemic>", but that no longer works for COVID. But really the intuition is more like "life is very different in this pandemic relative to previous epidemics; it would be shocking if this didn't make the problem harder in some way that we failed to foresee".
Hmm. This just doesn't seem like what was going on to me at all. I think I disagree a lot about this, and it seems less about "how things will shake out in Slow AI Takeoff" and more about "how badly and obviously-in-advance and easily-preventably did we screw up our covid response."
(I expect we also disagree about how Slow Takeoff would look, but I don't think that's the cruxy bit for me here).
I'm sort of hesitant to jump into the "why covid obviously looks like mass institutional failure, given a very straightforward, well understood scenario" argument because I feel like it's been hashed out a lot in the past 3 months and I'm not sure where to go with it – I'm assuming you've read the relevant arguments and didn't find them convincing.
The sort of things I have in mind include:
These problems all seemed fairly straightforward and understood. There might also be novel problems going on but they don't seem necessary to hypothesize given the above types of failure.
Ah, I see. I agree with this and do think it cuts against my point #1, but not points #2 and #3. Edited the top-level comment to note this.
Tbc, I find it quite likely that there was mass institutional failure with COVID; I'm mostly arguing that soft takeoff is sufficiently different from COVID that we shouldn't necessarily expect the same mass institutional failure in the case of soft takeoff. (This is similar to Matthew's argument that the pandemic shares more properties with fast takeoff than with slow takeoff.)
Ah, okay. I think I need to at least think a bit harder to figure out if I still disagree in that case.
I do definitely expect different institutional failure in the case of Soft Takeoff. But it sort of depends on what level of abstraction you're looking at the institutional failure through. Like, the FDA won't be involved. But there's a decent chance that some other regulatory will be involved, which is following the underlying FDA impulse of "Wield the one hammer we know how to wield to justify our jobs." (In a large company, it's possible that regulatory body could be a department inside the org, rather than a government agency)
In reasonably good outcomes, the decisions are mostly being made by tech companies full of specialists who well understand the problem. In that case the institutional failures will look more like "what ways do tech companies normally screw up due to internal politics?"
There's a decent chance the military or someone will try to commandeer the project, in which case more typical government institutional failures will become more relevant.
One thing that seems significant is that 2 years prior to The Big Transition, you'll have multiple companies with similar-ish tech. And some of them will be appropriately cautious (like New Zealand, Singapore), and others will not have the political wherewithal to slow down and think carefully and figure out what inconvenient things they need to do and do them (like many other countries in covid)
Yeah, these sorts of stories seem possible, and it also seems possible that institutions try some terrible policies, notice that they're terrible, and then fix them. Like, this description:
just doesn't seem to match my impression of non-EAs-or-rationalists working on AI governance. It's possible that people in government are much less competent than people at think tanks, but this would be fairly surprising to me. In addition, while I can't explain FDA decisions, I still pretty strongly penalize views that ascribe huge very-consequential-by-their-goals irrationality to small groups of humans working full time on something.
(Note I would defend the claim that institutions work well enough that in a slow takeoff world the probability of extinction is < 80%, and probably < 50%, just on the basis that if AI alignment turned out to be impossible, we can coordinate not to build powerful AI.)
Are you saying you think that wasn't a fair characterization of the FDA, or that the hypothetical AI Governance bodies would be different from the FDA?
(The statement was certainly not very fair to the FDA, and I do expect there was more going on under the hood than that motivation. But, I do broadly think governing bodies do what they are incentivized to do, which includes justifying themselves, especially after being around a couple decades and gradually being infiltrated by careerists)
I am mostly confused, but I expect that if I learned more I would say that it wasn't a fair characterization of the FDA.
This also jumped out at me as being only a subset of what I think of as "AI alignment"; like, ontological collapse doesn't seem to have been a failure of narrow AI systems. [By 'ontological collapse', I mean the problem where the AI knows how to value 'humans', and then it discovers that 'humans' aren't fundamental and 'atoms' are fundamental, and now it's not obvious how its preferences will change.]
Perhaps you mean "AI alignment in the slow takeoff frame", where 'narrow' is less a binary judgment and more of a continuous judgment; then it seems more compelling, but I still think the baseline prediction should be doom if we can only ever solve problems after encountering them.
I do mean this.
I'd predict that either ontological collapse won't be a problem, or we'll notice it in AI systems that are less general than humans. (After all, humans have in fact undergone ontological collapse, so presumably AI systems will also have undergone it by the time they reach human level generality.)
This depends on what you count as "encountering a problem".
At one extreme, you might look at Faulty Reward Functions in the Wild and this counts as "encountering" the problem "If you train using PPO with such-and-such hyperparameters on the score reward function in the CoastRunners game then on this specific level the boat might get into a cycle of getting turbo boosts instead of finishing the race". If this is what it means to encounter a problem, then I agree the baseline prediction should be doom if we only solve problems after encountering them.
At the other extreme, maybe you look at it and this counts as "encountering" the problem "sometimes AI systems are not beneficial to humans". So, if you solve this problem (which we've already encountered), then almost tautologically you've solved AI alignment.
I'm not sure how to make further progress on this disagreement.
Planned summary for the Alignment Newsletter:
Thanks Rohin for covering the post in the newsletter!
The summary looks great overall. I have a minor objection to the word "narrow" here: "we may fail to generalize from narrow AI systems to more general AI systems". When I talked about generalizing from less advanced AI systems, I didn't specifically mean narrow AI - what I had in mind was increasingly general AI systems we are likely to encounter on the path to AGI in a slow takeoff scenario.
For the opinion, I would agree that it's not clear how well the covid scenario matches the slow takeoff scenario, and that there are some important disanalogies. I disagree with some of the specific disanalogies you point out though:
Changed narrow/general to weak/strong in the LW version of the newsletter (unfortunately the newsletter had already gone out when your comment was written).
There was some worry about supply chain problems for food. Perhaps that didn't materialize, or it did materialize and it was solved without me noticing.
I expect that this was the first extended shelter-in-place order for most if not all of the US, and this led to a bunch of problems in deciding what should and shouldn't be included in the order, how stringent to make it, etc.
More broadly, I'm not thinking of any specific problem, but the world is clearly very different than it was in any recent epidemic (at least in the US), and I would be shocked if this did not bring with it several challenges that we did not anticipate ahead of time (perhaps someone somewhere had anticipated it, but it wasn't widespread knowledge).
I definitely agree that we can decrease the likelihood of pandemics arising, but we can't really hope to eliminate them altogether (with current technology). But really I think this was not my main point, and I summarized my point badly: the point was that given that alignment is about preventing misalignment from arising, the analogous thing for pandemics would be about preventing pandemics from arising; it is unclear to me whether civilization was particularly inadequate along this axis ex ante (i.e. before we knew that COVID was a thing).
I tend to think that the pandemic shares more properties with fast takeoff than it does with slow takeoff. Under fast takeoff, a very powerful system will spring into existence after a long period of AI being otherwise irrelevant, in a similar way to how the virus was dormant until early this year. The defining feature of slow takeoff, by contrast, is a gradual increase in abilities from AI systems all across the world.
In particular, I object to this portion of your post,
I'm not convinced that these parallels to COVID-19 are very informative. Compared to this pandemic, I expect the direct effects of AI to be very obvious to observers, in a similar way that the direct effects of cars are obvious to people who go outside. Under a slow takeoff, AI will already be performing a lot of important economic labor before the world "goes crazy" in the important senses. Compare to the pandemic, in which
Some good points, but on the contrary: a slow take-off is considered safer because we have more lead time and warning shots, but the world has seen many similar events and warning shots for covid. Ones that come to mind in the last two decades are swine flu, bird flu, and Ebola, and of course there have been many more over history.
This just isn’t that novel or surprising, billionaires like Bill Gates have been sounding the alarm, and still the supermajority of Western countries failed to take basic preventative measures. Those properties seem similar to even the slow take-off scenario. I feel like the fast-takeoff analogy would go through most strongly in a world where we'd just never seen this sort of pandemic before, but in reality we've seen many of them.
Thanks Matthew for your interesting points! I agree that it's not clear whether the pandemic is a good analogy for slow takeoff. When I was drafting the post, I started with an analogy with "medium" takeoff (on the time scale of months), but later updated towards the slow takeoff scenario being a better match. The pandemic response in 2020 (since covid became apparent as a threat) is most relevant for the medium takeoff analogy, while the general level of readiness for a coronavirus pandemic prior to 2020 is most relevant for the slow takeoff analogy.
I agree with Ben's response to your comment. Covid did not spring into existence in a world where pandemics are irrelevant, since there have been many recent epidemics and experts have been sounding the alarm about the next one. You make a good point that epidemics don't gradually increase in severity, though I think they have been increasing in frequency and global reach as a result of international travel, and the possibility of a virus escaping from a lab also increases the chances of encountering more powerful pathogens in the future. Overall, I agree that we can probably expect AI systems to increase in competence more gradually in a slow takeoff scenario, which is a reason for optimism.
Your objections to the parallel with covid not being taken seriously seem reasonable to me, and I'm not very confident in this analogy overall. However, one could argue that the experience with previous epidemics should have resulted in a stronger prior on pandemics being a serious threat. I think it was clear from the outset of the covid epidemic that it's much more contagious than seasonal flu, which should have produced an update towards it being a serious threat as well.
I agree that the direct economic effects of advanced AI would be obvious to observers, but I don't think this would necessarily translate into widespread awareness that much more powerful AI systems are imminent that could transform the world even more. People are generally bad at reacting to exponential trends, as we've seen in the covid response. If we had general-purpose household robots in every home, I would expect some people to take the risks of general AI more seriously, and some other people to say "I don't see my household robot trying to take over the world, so these concerns about general AI are overblown". Overall, as more advanced AI systems are developed and have a large economic impact, I would expect the proportion of people who take the risks of general AI seriously to increase steadily, but wouldn't expect widespread consensus until relatively late in the game.
I personally agree with the OP, and have found at least the US's response to Covid-19 fairly important for modeling how it might respond to AI. I also found it particularly interesting that it focused on the "Slow Takeoff" scenario. I wouldn't have thought to make that specific comparison, and found it surprisingly apt.
I also think that, regardless of whether one agrees with the OP, I think "how humanity collectively responded to Covid-19" is still important evidence in some form about how we can expect them to handle other catastrophes, and worth paying attention to, and perhaps debating.
I think we have our answer to the Fermi paradox in our hopeless response to the CV pandemic. The median European country has had deaths/million more than 10 times worse than best practice (Taiwan etc). https://www.worldometers.info/coronavirus/#countries
Civilizations will arise when the species concerned is only barely able to manage the job. I think world history suggests that this is very true of us. The chances of being up to handling the much more complex, difficult challenges of going to the next level seem low.
Fermi paradox has a much simpler answer: https://slatestarcodex.com/2018/07/03/ssc-journal-club-dissolving-the-fermi-paradox/
That analysis has been trenchantly criticised and I don't find it convincing.
I haven't heard this. What's the strongest criticism?
If the linked SSC article is about the aestivation hypothesis, see the rebuttal here.