Posts

Sorted by New

Wiki Contributions

Comments

25Hour3y10

I'd be concerned that our instincts toward vengeance in particular would result in extremely poor outcomes if you give humans near-unlimited power (which is mostly granted by being put in charge of an otherwise-sovereign AGI); one potential example is the AGI-controller sending a murderer to an artificial, semi-eternal version of Hell as punishment for his crimes.  I believe there's a Black Mirror episode exploring this.  In a hypothetical AGI good outcome, this cannot occur.

The idea of a committee of ordinary humans, ems, and semi-aligned AI which are required to agree in order to perform actions does, in principle, avoid this failure mode.  Though worth pointing out that this also requires AI-alignment to be solved-- if the AGI is effectively a whatever-maximizer with the requirement that it gets agreement from the ems and humans before acting, it will acquire that agreement by whatever means, which brings up the question of whether the participation of the humans and ems is at all meaningful in this scenario.

The obvious retort is that the AGI would be a tool-ai without the desire to maximize any specific property of the world beyond the degree to which it obeys the humans and ems in the loop; the usual objections to the idea of tool-ais as an AI alignment solution apply here.

Another possible retort, which you bring up, is that the AGI needs to (instead of maximizing a specific property of the world) understand the balance between various competing human values, which is (as I understand it) another way of saying it needs to be capable of understanding and implementing coherent extrapolated volition.  Which is fine, though if you posit this then CEV simply becomes the value to maximize instead; which brings us back to the objection that if it wants to maximize a thing conditional on getting consent from a human, it will figure out a way to do that.  Which turns "AI governs the world, in a fashion overseen by humans and ems" into "AI governs the world."