Part 6 of AI, Alignment, and Ethics. This will probably make more sense if you start with Part 1.

TL;DR In Parts 1 through 5 I discussed how to choose an ethical system, and implications for societies containing biological and/or uploaded humans and aligned AIs, and perhaps even other sapient species, but not all sentient animals. So far I've just been assuming that we have (somehow) built aligned AI. Now I'd like to look at how all of this relates to the challenge of achieving that vital goal: how we might align superintelligent AI, specifically using approaches along the lines of value learning, AI-assisted Alignment, or Coherent Extrapolated Volition (CEV) — or indeed any similar "do what I mean" kind of approach. The mutability of human values poses a major challenge to all of these: "Do what I want" or "do what I mean" is a lot less well-defined once ASI is in a position to affect that directly, rather than just by misinterpretation. Below I outline and critique a number of possible solutions: this is challenging, since when setting the terminal goal for ASI there is a strong tension between controlling the outcome and allowing our descendants the freedom to control their own destiny. Without a solution to strong corrigibility, we can only do set the terminal goal once, ever, which privileges the views of whatever generation gets to do this. The possibilities opened up by genetic engineering and cyborging make the mutable values problem far more acute, and I explore a couple of toy examples from the ethical conundrums of trying to engineer away psychopathy and war. Finally I suggest a tentative proposal for a compromise solution for mutable values, which builds upon the topics discussed in the previous parts of the sequence.

In what follows I'm primarily going to discuss a future society that is aligning its Artificial Superhuman Intelligences (ASIs) either using value learning, or some nearly-functionally-equivalent form of AI-assisted alignment, such that the bulk of the work of figuring out the fine details of complex and fragile human values and then building a usable normative model of them is being done by superhuman AIs. CEV isn't quite the same proposal as value learning, though the two are related, and they face somewhat similar challenges over mutability of values, as do other similar "do what I mean" solutions that have also been proposed. I pe...