I’m Michael Aird, a Staff Researcher at Rethink Priorities, Research Scholar at the Future of Humanity Institute, and guest manager at the Effective Altruism Infrastructure Fund. Opinions expressed are my own. You can give me anonymous feedback at this link.

With Rethink, I'm currently mostly working on nuclear risk research. I might in future work on topics related to what I'm calling "Politics, Policy, and Security from a Broad Longtermist Perspective".

Previously, I did longtermist macrostrategy research for Convergence Analysis and then for the Center on Long-Term Risk. More on my background here.

I mostly post to the EA Forum.

If you think you or I could benefit from us talking, feel free to message me or schedule a call. For people interested in doing effective-altruism-related research/writing, testing their fit for that, "getting up to speed" on EA/longtermist topics, or writing for the EA Forum/LessWrong, I also recommend this post.


Rohin Shah on reasons for AI optimism

Interesting (again!).

So you've updated your unconditional estimate from ~5% (1 in 20) to ~9%? If so, people may have to stop citing you as an "optimist"... (which was already perhaps a tad misleading, given what the 1 in 20 was about)

(I mean, I know we're all sort-of just playing with incredibly uncertain numbers about fuzzy scenarios anyway, but still.)

Rohin Shah on reasons for AI optimism

Quite interesting. Thanks for that response.

And yes, this does seem quite consistent with Ord's framing. E.g., he writes "my estimates above incorporate the possibility that we get our act together and start taking these risks very seriously." So I guess I've seen it presented this way at least that once, but I'm not sure I've seen it made explicit like that very often (and doing so seems useful and retrospectively-obvious).

But if we just exerted a lot more effort (i.e. "surprisingly much action"), the extra effort probably doesn't help much more than the initial effort, so maybe... 1 in 25? 1 in 30?

Are you thinking roughly that (a) returns diminish steeply from the current point, or (b) that effort will likely ramp up a lot in future and pluck a large quantity of the low hanging fruit that currently remain, such that even more ramping up would face steeply diminishing returns?

That's a vague question, and may not be very useful. The motivation for it is that I was surprised you saw the gap between business as usual and "surprisingly much action" as being as small as you did, and wonder roughly what portion of that is about you thinking additional people working on this won't be very useful, vs thinking very super useful additional people will eventually jump aboard "by default".

Rohin Shah on reasons for AI optimism

Thanks for this reply!

Perhaps I should've been clear that I didn't expect what I was saying was things you hadn't heard. (I mean, I think I watched an EAG video of you presenting on 80k's ideas, and you were in The Precipice's acknowledgements.)

I guess I was just suggesting that your comments there, taken by themselves/out of context, seemed to ignore those important arguments, and thus might seem overly optimistic. Which seemed mildly potentially important for someone to mention at some point, as I've seen this cited as an example of AI researcher optimism. (Though of course I acknowledge your comments were off the cuff and not initially intended for public consumption, and any such interview will likely contain moments that are imperfectly phrased or open to misinterpretation.)

Also, re: Precipice, it's worth noting that Toby and I don't disagree much -- I estimate 1 in 10 conditioned on no action from longtermists; he estimates 1 in 5 conditioned on AGI being developed this century. Let's say that action from longtermists can halve the risk; then my unconditional estimate would be 1 in 20[...] (emphasis added)

I find this quite interesting. Is this for existential risk from AI as a whole, or just "adversarial optimisation"/"misalignment" type scenarios? E.g., does it also include things like misuse and "structural risks" (e.g., AI increasing risks of nuclear war by forcing people to make decisions faster)?

I'm not saying it'd be surprisingly low if it does include those things. I'm just wondering, as estimates like this are few and far between, so now that I've stumbled upon one I want to understand its scope and add it to my outside view.

Also, I bolded conditioned and unconditional, because that seems to me to suggest that you also currently expect the level of longtermist intervention that would reduce the risk to 1 in 20 to happen. Like, for you, "there's no action from longtermists" would be a specific constraint you have to add to your world model? That also makes sense; I just feel like I've usually not seen things presented that way.

I imagine you could also condition on something like "surprisingly much action from longtermists", which would reduce your estimated risk further?

Rohin Shah on reasons for AI optimism
You could imagine a situation where for some reason the US and China are like, “Whoever gets to AGI first just wins the universe.” And I think in that scenario maybe I’m a bit worried, but even then, it seems like extinction is just worse, and as a result, you get significantly less risky behavior? But I don’t think you get to the point where people are just literally racing ahead with no thought to safety for the sake of winning.

My interpretation of what Rohin is saying there is:

  • 1) Extinction is an extremely bad outcome.
  • 2) It's much worse than 'losing' an international competition to 'win the universe'.
  • 3) Countries/institutions/people will therefore be significantly inclined to avoid risking extinction, even if doing so would increase the chances of 'winning' an international competition to 'win the universe'.

I agree with claim 1.

I agree with some form of claim 3, in that:

  • I think the badness of extinction will reduce the risks people are willing to take
  • I also "don’t think you get to the point where people are just literally racing ahead with no thought to safety for the sake of winning."
  • But I don't think the risks will be reduced anywhere near as much as they should be. (That said, I also believe that odds are in favour of things "going well by default", just not as much in favour of that as I'd like).

This is related to my sense that claim 2 is somewhat tricky/ambiguous. Are we talking about whether it is worse, or whether the relevant actors will perceive it as worse? One common argument for why existential risks are neglected is that it's basically a standard market failure. The vast majority of the harm from x-risks are externalities, and x-risk reduction is a global public good. Even if we consider deaths/suffering in the present generation, even China and India absorb less than half of that "cost", and most countries absorb less than 1% of them. And I believe most people focused on x-risk reduction are at least broadly longtermist, so they'd perceived the overwhelming majority of the costs to be to future generations, and thus also externalities.

So it seems like, unless we expect the relevant actors to act in accordance with something close to impartial altruism, we should expect them to avoid risks somewhat to avoid existential risks (or extinction specifically), but far less than they really should. (Roughly this argument is made in The Precipice, and I believe by 80k.)

(Rohin also discusses right after that quote why he doesn't "think that differences in who gets to AGI first are going to lead to you win the universe or not", which I do think somewhat bolsters the case for claim 2.)

Rohin Shah on reasons for AI optimism

Interesting interview, thanks for sharing it!

Asya Bergal: It seems like people believe there’s going to be some kind of pressure for performance or competitiveness that pushes people to try to make more powerful AI in spite of safety failures. Does that seem untrue to you or like you’re unsure about it?
Rohin Shah: It seems somewhat untrue to me. I recently made a comment about this on the Alignment Forum. People make this analogy between AI x-risk and risk of nuclear war, on mutually assured destruction. That particular analogy seems off to me because with nuclear war, you need the threat of being able to hurt the other side whereas with AI x-risk, if the destruction happens, that affects you too. So there’s no mutually assured destruction type dynamic.

I find this statement very confusing. I wonder if I'm misinterpreting Rohin. Wikipedia says "Mutual(ly) assured destruction (MAD) is a doctrine of military strategy and national security policy in which a full-scale use of nuclear weapons by two or more opposing sides would cause the complete annihilation of both the attacker and the defender (see pre-emptive nuclear strike and second strike)."

A core part of the idea of MAD is that the destruction would be mutual. So "with AI x-risk, if the destruction happens, that affects you too" seems like a reason why MAD is a good analogy, and why the way we engaged in MAD might suggest people would engage in similar brinkmanship or risks with AI x-risk, even if the potential for harm to people's "own side" would be extreme. There are other reasons why the analogy is imperfect, but the particular feature Rohin mentions seems like a reason why an analogy could be drawn.