Artaxerxes — AI Alignment Forum

I'm fairly agnostic about how dumb we're talking - what kinds of acts or confluence of events are actually likely to be effective complete x-risks, particularly at relatively low levels of intelligence/capability. But that's besides the point in some ways, because whereever someone might place the threshold for x-risk capable AI, as long as you assume that greater intelligence is harder to produce (an assumption that doesn't necessarily hold, as I acknowledged), I think that suggests that we will be killed by something not much higher than that threshold once it's first reached.

Evolution tries many very similar designs, always moving in tiny steps through the search space. Humans are capable of moving in larger jumps. Often the difference between one attempt and the next is several times more compute. No one trained something 90% as big as GPT3 before GPT3.

This is true for now, but there's a sense in which the field is in a low hanging fruit picking stage of development where there's plenty of room to scale massively fairly easily. If the thresholds are crossed during a stage like this where everyone is rushing to collect big, easy advances, then yes, I would expect the gap between how much more intelligent/capable the AI that kills us is, relative to how intelligent it needed to be, to be much higher (but still not that much higher, unless e.g. fast takeoff etc.). Conversely in a world where progress is in a more incremental stage, then I would expect a smaller gap.

If the AI is on a computer system where it can access its own code. (Either deliberately, or through dire security even a dumb AI can break) the amount of intelligence needed to see "neuron count" and turn it up isn't huge. Basically, the amount of power needed to do a bit of AI research to make itself a little smarter is on the level of a smart AI expert. The intelligence needed to destroy humanity is higher than that.

Sure, we may well have dumb AI failures first. That stock market crash. Maybe some bug in a flight control system that makes lots of planes do a nosedive at once. But it's really hard to see how an AI failure could lead to human extinction, unless the AI was smart enough to develop new nanotech (And if it can do that, it can make itself smarter).

The first uranium pile to put out a lethal dose of radiation can put out many times a lethal dose of radiation. Because a subcritical pile doesn't give out enough radiation. And once the pile is critical, it quickly ramps up to loads of radiation.

Can you name any strategy the AI could use to wipe out humanity, without strongly implying an AI smart enough for substantial self improvement?

Self-improvement to me doesn't automatically mean RSI takeoff to infinity - an AI that self-improves up to a certain point capable of wiping out humanity but not yet reaching criticality seems to me to be possible.

I agree though that availability of powerful grey/black ball technologies like nanotech that could require fewer variables going wrong and less intelligence to nevertheless enable an AI to plausibly represent an x-risk is a big factor. Other existing technologies like engineered pandemics or nuclear weapons, while dangerous, seem somewhat difficult even with AI to leverage into fully wiping out humanity at least by themselves, even if they could lead to worlds that are much more vulnerable to further shocks.

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

Posts

Wikitag Contributions

Comments