What if the Alignment Problem is impossible?  

It would be sad for humanity if we live in a world where building AGI is very possible but aligning AGI is impossible.  Our curiosity, competitive dynamics, and understandable desire for a powerful force for good will spur us to build the unaligned AGI and then humans will live at the AGI’s mercy from then on and either live lucky & happy on the knife’s edge, be killed by the AGI, or live in some state we do not enjoy.

For argument’s sake, imagine we are in that world in which it is impossible to force a super-intelligence to value humans sufficiently - just as chimpanzees could not have controlled the future actions of humans had they created us.

What if it is within human ability to prove that Alignment is impossible?

What if, during the Manhattan Project, the scientists had performed the now famous calculation and  determined that yes, in fact, the first uncontrolled atomic chain reaction would have ignited the atmosphere and the calculation was clear for all to see?

Admittedly, this would have been a very scary world.  It’s very unclear how long humanity could have survived in such a situation.  

But one can imagine a few strategies:

  • Secure existing uranium supplies - as countries actually did.
  • Monitor the world for enrichment facilities and punish bad actors severely.
  • Accelerate satellite surveillance technology.
  • Accelerate military special operations capabilities.
  • Develop advanced technologies to locate, mine, blend and secure fissionable materials.
  • Accelerate space programs and populate the Moon and Mars.

Yes, a scary world.  But, one can see a path through the gauntlet to human survival as a species.  (Would we have left earth sooner and reduced other extinction risks?)  

Now imagine that same atmosphere-will-ignite world but the Manhattan Project scientists did not perform the calculation.  Imagine that they thought about it but did not try.

All life on earth would have ended, instantly, at Trinity.

Are we investing enough effort trying to prove that alignment is impossible?  

Yes, we may be in a world in which it is exceedingly difficult to align AGI but also a world in which we cannot prove that alignment is impossible.  (This would have been the atmosphere-will-ignite world but the math to check ignition is too difficult - a very sad world that would have ceased to exist on July 16, 1945, killing my 6 year old mother.)

On the other hand, if we can prove alignment is impossible, the game is changed.  If the proof is sufficiently clear, forces to regulate companies and influence nation states will become dramatically greater and our chances for survival will increase a lot.

Proposal: The Impossibility X-Prize

  • $10 million?
  • Sufficient definition of “alignment”, “AGI”, and the other concepts necessary to establish the task and define its completion

Even if we fail, the effort of trying to prove alignment is impossible may yield insights as to how alignment is possible and make alignment more likely.

If impossibility is not provable, the $10 million will never be spent.
If we prove impossibility, it will be the best $10 million mankind ever spent.

Let's give serious effort to the ignition calculation of our generation.


As an update to this post, I recommend readers interested in this topic read On Controllability of AI by Roman V. Yampolskiy.

New Comment