Donald Hobson

MMath Cambridge. Currently studying postgrad at Edinburgh.


Neural Networks, More than you wanted to Show
Logical Counterfactuals and Proposition graphs

Wiki Contributions


If the AI thinks it has a descent shot at this, it must already be pretty smart. Does a world where an AI tried to take over and almost succeeded look pretty normal? Or is this a thing where the AI thinks it has a 1 in a trillion chance, and tries anyway?

Maybe. What do the models gain by hiding? 

Suppose that when all is said and done they see the following:

And my interpretation of that graph is "the rightmost datapoint has been gathered, and the world hasn't ended yet. What the ????"

Deception is only useful to the extent that it achieves the AI's goals. And I think the main way deception helps an AI is in helping it break out and take over the world. Now maybe the AI's on the right have broken out, and are just perfecting their nanobots. But in that case, I would expect the graph, the paper the graph is in, and most of the rest of the internet, to be compromised into whatever tricked humans in an AI convenient direction. 

And "if we see X, that means the datasource has been optimized to make us do Y" Well if we decide not to do Y in response to seeing X, then the optimizer won't show us X. We can say "don't trust the data source, and never ever do Y whatever it tells us." This isn't practical with future superintelligences and the whole internet. 

What will you do that makes your game better than the many other computer games out there? 

I presume that random members of the public will be playing this game, not just a handful of experts.

Once the AI realizes its in a game and being tested, you have basically no reason to suspect its behaviour in the game to be correlated with behaviour in reality. Given random people who are making basically no attempt to keep this secret, any human level AI will realize this. Freely interacting with large numbers of gullible internet randos isn't the best from a safety standpoint either. I mean a truely superintelligent AI will persuade MIRI to let it out of any box, but for a top human level AI, it could more easily trick internet randos.

 If you kept the controls much tighter, and only had a few AI experts interacting with it, you could possibly have an AI smart enough to be useful, and dumb enough to not realize its in a box being tested.  But this makes getting enough training data hard.

This is great, but also misses the loopyness. If GPT12 looks at the future, surely most of that future is massively shaped by GPT12. We are in fixed point, self fulfilling prophecy land. (Or, if you somehow condition on its current output being nothing, then the next slightly different attempt with GPT13. ) If GPT-n doubles the chance of success, the only fixed point is success. 

I don't think lazy data structures can pull this off. The AI must calculate various ways human extinction could affect its utility. 

So unless there are some heuristics that are so general they cover this as a special case, and the AI can find them without considering the special cases first, then it must explicitly consider human extinction.

I would expect a superhuman AI to be really good at tracking the consequences of its actions. The AI isn't setting out to wipe out humanity. But in the list of side effects of removing all oxygen, along with many things no human would ever consider, is wiping out humanity. 

AIXI tracks every consequence of its actions, at the quantum level. A physical AI must approximate, tracking only the most important consequences. So in its decision process, I would expect a smart AI to extensively track all consequences that might be important.

True. But this is getting into the economic competition section. It's just hard to imagine this being an X-risk. I think that in practice, if its human politicians and bosses rolling the tech out, the tech will be rolled out slowly and inconsistently. There are plenty of people with lots of savings. Plenty of people who could go to their farm and live off the land. Plenty of people who will be employed for a while if the robots are expensive, however cheep the software. Plenty of people doing non-jobs in bureaucracies, who can't be fired and replaced for political reasons. And all the rolling out, setting up, switching over, running out of money etc takes time. Time where the AI is self improving. So the hard FOOM happens before too much economic disruption. Not that economic disruption is an X-risk anyway. 

as long as you assume that greater intelligence is harder to produce (an assumption that doesn't necessarily hold, as I acknowledged), I think that suggests that we will be killed by something not much higher than that threshold once it's first reached.

So long as we assume the timescales of intelligence growth are slow compared to destroying the world timescales. If an AI is smart enough to destroy the world in a year (in the hypothetical where it had to stop self improving and do it now). A day of self improvement and they are smart enough to destroy the world in a week. Another day of self improvement and they can destroy the world in an hour.

Another possibility is an AI that doesn't choose to destroy the world at the first available moment. 

Imagine a paperclip maximizer. It thinks it has a 99% chance of destroying the world and turning everything into paperclips. And a 1% chance of getting caught and destroyed. If it waits for another week of self improvement, it can get that chance down to 0.0001%. 

This is true for now, but there's a sense in which the field is in a low hanging fruit picking stage of development where there's plenty of room to scale massively fairly easily.

Suppose the limiting factor was compute budget. Making each AI 1% bigger than before means basically wasting compute running pretty much the same AI again and again. Making each AI about 2x as big as the last is sensible. If each training run costs a fortune, you can't afford to go in tiny steps.

Ok, how dumb are we talking. I don't think an AI stupider than myself can directly wipe out humanity. (I see no way to do so myself, at my current skill level. I mean I know nanogoo is possible in principle, but I also know that I am not smart enough to singlehandedly create it) 

If the AI is on a computer system where it can access its own code. (Either deliberately, or through dire security even a dumb AI can break) the amount of intelligence needed to see "neuron count" and turn it up isn't huge. Basically, the amount of power needed to do a bit of AI research to make itself a little smarter is on the level of a smart AI expert. The intelligence needed to destroy humanity is higher than that. 

Sure, we may well have dumb AI failures first. That stock market crash. Maybe some bug in a flight control system that makes lots of planes do a nosedive at once. But it's really hard to see how an AI failure could lead to human extinction, unless the AI was smart enough to develop new nanotech (And if it can do that, it can make itself smarter). 

The first uranium pile to put out a lethal dose of radiation can put out many times a lethal dose of radiation. Because a subcritical pile doesn't give out enough radiation. And once the pile is critical, it quickly ramps up to loads of radiation. 

Evolution tries many very similar designs, always moving in tiny steps through the search space. Humans are capable of moving in larger jumps. Often the difference between one attempt and the next is several times more compute. No one trained something 90% as big as GPT3 before GPT3. 

Can you name any strategy the AI could use to wipe out humanity, without strongly implying an AI smart enough for substantial self improvement?

Load More