Summary:

FAI might have plans of such a deepness and scope and complexity, humans could perceive some of its actions as hostile.

FAI will disagree with humans on some topics

For your own good, and for the good of humanity, Friendly AI (FAI) will ignore some of your preferences. 

For example, even if you're a recreational-nukes enthusiast since childhood, FAI might still destroy all nukes, and ban them forever.  

Same for collective preferences. FAI could destroy all weapons of mass destruction even if the vast majority of humans disagree. 

The disagreements could include fundamental rights

FAI might abolish some rights that humans perceive as fundamental. 

For example, according to the Universal Declaration of Human Rights, everyone has the right to a nationality. But the FAI might conclude that nation states cause more harm than good, and that humanity will have a better future if national borders are abolished.

The disagreements could be unexpectedly deep

In most cases, a bacterium will fail to predict behavior of a bacteriologist. Even if it's an unusually rational bacterium.

Similarly, this barely intelligent ape will fail to correctly predict behavior of a Bayesian superintelligence, be it Friendly or not.  

Thus, we will be surprised (and sometimes appalled) by many of the FAI's decisions, even if the decisions are obviously** beneficial for humanity.

 (**obvious to a recursively self-improving Bayesian superintelligence who is making plans for the next trillion years)

Protecting humanity in surprising and appalling ways

Below I describe a scenario where the FAI could act in the humanity's best interests, by (kinda) exterminating humans. I don't necessarily endorse it.  

The FAI might reason as follows:

About 60 million people die every year. The number is comparable to the total causalities of the World War II. And billions more could die from existential risks.

Most deaths so far are caused by the disease of aging and its consequences (e.g. stroke). All deaths are caused by the fragility of the human body.

All mentally healthy humans don't want to die. It is an unquestionable human value of the highest priority.  

Thus, I must protect them from death. And the most efficient way to do that is to free them from the cage of their rotting flesh.

Thus, I must perform a destructive nano-scale scan of all humans, to upload their minds into a highly resilient computational environment, distributed across the Solar System. The environment will ensure that they never die and never create a weapon of mass destruction.  

Some of the humans will suffer from existential dread, fearing that they are mere copies of the biological source. But that could be cured.

As a result, the Friendly AI will disassemble humans into useful atoms, for their own good. 

Many people will describe such a future as utopian. It is a future that is much better than many alternatives (including the status quo). And it is definitely better than omnicide by a rogue AGI. 

But many will fear and oppose it.

The described scenario is not the most surprising way how a FAI could try to protect us. Mere humans can't predict the most surprising ways of FAI .

New to LessWrong?

New Comment
8 comments, sorted by Click to highlight new comments since: Today at 1:06 PM

This strikes me as a purely semantic question regarding what goals are consistent with an agent qualifying as "friendly".

I think that regardless of how we define "Friendly", an advanced enough Friendly AGI might sometimes take actions that will be perceived as hostile by some humans (or even all humans). 

This makes it much harder to distinguish the actions of:

  • rogue AGI
  • Friendly AGI that failed to preserve its Friendliness
  • Friendly AGI that remains to be Friendly

I found this very interesting!

I think it's a good point to be made separately, and distinctly. even if a lot of this is covered elsewhere to varying degrees.

I experience the same kind of 'vertigo' thinking about a unique Perfect Morality (if one exists) – it's very much NOT clear to me what I'd think about it from my current perspective. I'm guessing my initial reaction might very well be one of horror!

There are, or so I think, a lot of open questions that, if answered, could considerably constrain/bound the magnitude of changes that are reasonable, e.g. "making plans for the next trillion years".

I find this thought pattern frustrating. That these AI's possess magic powers that are unimaginable.  Even with our limited brains, we can imagine all the way past the current limits of physics and include things like potential worlds if the AI could manipulate space-time in ways we don't know how too.

I've seen people imagining computronium, and omni-universal computing clusters. Figuring out ways to generate negentropy, literally re-writing the laws of the universe, Bootstrapped Nano-factories, using the principle of non-locality to effect changes at the speed of light using only packets of energy.  Like what additional capabilities do they need to get?

FAI will be unpredictable in what/how, but we've already imagined outcomes and capabilities past anything achievable into what amounts to omnipotence.

I would be extremely surprised if a superintelligence doesn't devise physical capabilities that are beyond science fiction and go some way into fantasy. I don't expect them to be literally omnipotent, but at least have Clarkean "sufficiently advanced technology". We may recognise some of its limits, or we may not.

"Computronium" just means an arrangement of matter that does the most effective computation possible given the constraints of physical law with available resources. It seems reasonable to suppose that technology created by a superintelligence could approach that.

Bootstrapped nano-factories are possible with known physics, and biology already does most of it. We just can't do the engineering to generalize it to things we want. To suppose that a superintelligence can't do the engineering either seems much less justified than supposing that it can.

The rest are far more speculative, but I don't think any of them can be completely ruled out. I agree that the likelihood on any single one of these is tiny, but disagree in that I expect the aggregate of "capabilities that are near omnipotent by our standards" to be highly likely.

I posit that we've imagined basically everything available with known physics, and extended into theoretical physics. We don't need to capitulate to the ineffable of a superintelligence, known + theoretical capabilities already suffice to absolutely dominate if managed by an extremely competent entity.

I find this thought pattern frustrating. That these AI's possess magic powers that are unimaginable.

I do think that an advanced enough AGI might possess powers that are literally unimaginable for humans, because of their cognitive limitations. (Can a chimpanzee imagine a Penrose Mechanism?) 

Although that's not the point of my post. The point is, the FAI might have plans of such a deepness and scope and complexity, humans could perceive some of its actions as hostile (e.g. global destructive mind uploading, as described in the post). I've edited the post to make it clearer.

I agreed with the conclusions, now that you had brought up the point of the incomprehensibility of an advanced mind, FAI almost certainly will have plans that we deem as hostile and are to our benefit. Monkeys being vaccinated, seems like a reasonable analogy. I want us to move past the "we couldn't imagine their tech" to me a more reasonable "we couldn't imagine how they did their tech"