This was written as part of the first Refine blog post day.
A sneaking suspicion that I’ve found difficult to shake off while following AI risk discussion is that concerns regarding superintelligent AI, while clearly valid and important, have seemed to me to be very plausibly jumping the gun when not altogether unlikely nearer term AI risks are looming and have potential to wipe us out long before we reach that point, especially under the conditions where we do very little to attempt to mitigate them.
One reason I think I have for this intuition comes from extending an occasionally-repeated idea about human intelligence: that humans are nearly the dumbest possible creature capable of developing a technological civilisation. Part of the reasoning behind this take involves something along the lines of invoking how difficult it is to be wading against entropy to create sophisticated and powerful and complex intelligent agents.
That is to say, evolution as an optimisation process was very unlikely to produce intelligence and general capability greatly in excess of what was needed for an instantiation of something like “technological civilisation”, because of these two assumptions taken together:
1. The first species evolution produced that crossed the necessary thresholds of capability would be the ones to instantiate it in the world first, regardless of how barely above those thresholds they might have been, and
2. Greater intelligence/capability is in some sense generally more “difficult” to produce or requires stronger optimisation and luckier dicerolls as compared to lesser intelligence/capability.
In a similar way, it seems very possible to me that we might have no particular need to worry about being wiped out by the consequences of our creating unaligned superintelligent beings far beyond our comprehension, if only because I expect that significantly less sophisticated AI systems we create that nevertheless have just enough capability juice to meet the threshold to do the job, will indeed do so first.
Of course, it might be the case that there are large discontinuities in the relationship between difficulty of creation and capability of AI. For example, if it happens to be relatively easy to fall into gravity wells of classic fast-takeoff style recursive self-improvement in the landscape of possible AI designs, where we very suddenly happen to produce significantly more intelligence/capability once we hit certain points in design space, then we may still need to worry about superintelligences that tower over humanity early.
But if not, consider that the first AI that caused a trillion dollar market crash was something that happened accidentally, the result of unfortunate interactions between “dumb”, rather simple and unsophisticated high-frequency trading algorithms and not, say, some meticulously crafted and optimised system designed by short-sellers or vandals, or as part of the machinations of some genius general AI.
Like this and in general I expect that as we cede more and more control over our world to automated systems with less and less human input, the potential adverse impacts of failures of alignment of AI rises too. The thresholds of intelligence required to wreak havoc lowers significantly the more you begin with all of the control over resources needed in hand. How vulnerable the world might happen to be also plays a big role, as AI seems to me to have lots of potential to greatly leverage any upcoming grey or black ball technologies.
And even if we somehow navigate through these “dumb” risks without wiping ourselves out, I worry that the strategies we might scrounge up to avoid them will be of the sort that are very unlikely to generalise once the superintelligence risks do eventually rear their heads. But it won’t matter if we don’t even get there, and the shambolic nature of human society as composed of nearly the dumbest possible creature able to create technological civilisation does not fill me with optimism that we'll get that far.
Ok, how dumb are we talking. I don't think an AI stupider than myself can directly wipe out humanity. (I see no way to do so myself, at my current skill level. I mean I know nanogoo is possible in principle, but I also know that I am not smart enough to singlehandedly create it)
If the AI is on a computer system where it can access its own code. (Either deliberately, or through dire security even a dumb AI can break) the amount of intelligence needed to see "neuron count" and turn it up isn't huge. Basically, the amount of power needed to do a bit of AI research to make itself a little smarter is on the level of a smart AI expert. The intelligence needed to destroy humanity is higher than that.
Sure, we may well have dumb AI failures first. That stock market crash. Maybe some bug in a flight control system that makes lots of planes do a nosedive at once. But it's really hard to see how an AI failure could lead to human extinction, unless the AI was smart enough to develop new nanotech (And if it can do that, it can make itself smarter).
The first uranium pile to put out a lethal dose of radiation can put out many times a lethal dose of radiation. Because a subcritical pile doesn't give out enough radiation. And once the pile is critical, it quickly ramps up to loads of radiation.
Evolution tries many very similar designs, always moving in tiny steps through the search space. Humans are capable of moving in larger jumps. Often the difference between one attempt and the next is several times more compute. No one trained something 90% as big as GPT3 before GPT3.
Can you name any strategy the AI could use to wipe out humanity, without strongly implying an AI smart enough for substantial self improvement?
I'm fairly agnostic about how dumb we're talking - what kinds of acts or confluence of events are actually likely to be effective complete x-risks, particularly at relatively low levels of intelligence/capability. But that's besides the point in some ways, because whereever someone might place the threshold for x-risk capable AI, as long as you assume that greater intelligence is harder to produce (an assumption that doesn't necessarily hold, as I acknowledged), I think that suggests that we will be killed by something not much higher than that threshold once it's first reached.
This is true for now, but there's a sense in which the field is in a low hanging fruit picking stage of development where there's plenty of room to scale massively fairly easily. If the thresholds are crossed during a stage like this where everyone is rushing to collect big, easy advances, then yes, I would expect the gap between how much more intelligent/capable the AI that kills us is, relative to how intelligent it needed to be, to be much higher (but still not that much higher, unless e.g. fast takeoff etc.). Conversely in a world where progress is in a more incremental stage, then I would expect a smaller gap.
Self-improvement to me doesn't automatically mean RSI takeoff to infinity - an AI that self-improves up to a certain point capable of wiping out humanity but not yet reaching criticality seems to me to be possible.
I agree though that availability of powerful grey/black ball technologies like nanotech that could require fewer variables going wrong and less intelligence to nevertheless enable an AI to plausibly represent an x-risk is a big factor. Other existing technologies like engineered pandemics or nuclear weapons, while dangerous, seem somewhat difficult even with AI to leverage into fully wiping out humanity at least by themselves, even if they could lead to worlds that are much more vulnerable to further shocks.
So long as we assume the timescales of intelligence growth are slow compared to destroying the world timescales. If an AI is smart enough to destroy the world in a year (in the hypothetical where it had to stop self improving and do it now). A day of self improvement and they are smart enough to destroy the world in a week. Another day of self improvement and they can destroy the world in an hour.
Another possibility is an AI that doesn't choose to destroy the world at the first available moment.
Imagine a paperclip maximizer. It thinks it has a 99% chance of destroying the world and turning everything into paperclips. And a 1% chance of getting caught and destroyed. If it waits for another week of self improvement, it can get that chance down to 0.0001%.
Suppose the limiting factor was compute budget. Making each AI 1% bigger than before means basically wasting compute running pretty much the same AI again and again. Making each AI about 2x as big as the last is sensible. If each training run costs a fortune, you can't afford to go in tiny steps.
An attempt to name a strategy for an AI almost as smart as you: What fraction of jobs in the world are you intelligent enough to do, if you trained for them? I suspect that a huge fraction of the world's workers could not compete in a free fair market against an entity as smart as you that eats 15 dollars of electricity a day, works without breaks, and only has to be trained once for each task, after which millions of copies could be churned out.
True. But this is getting into the economic competition section. It's just hard to imagine this being an X-risk. I think that in practice, if its human politicians and bosses rolling the tech out, the tech will be rolled out slowly and inconsistently. There are plenty of people with lots of savings. Plenty of people who could go to their farm and live off the land. Plenty of people who will be employed for a while if the robots are expensive, however cheep the software. Plenty of people doing non-jobs in bureaucracies, who can't be fired and replaced for political reasons. And all the rolling out, setting up, switching over, running out of money etc takes time. Time where the AI is self improving. So the hard FOOM happens before too much economic disruption. Not that economic disruption is an X-risk anyway.