In developing aligned AI, one acting for our best interests, we would have to take care that it would have implemented, from the beginning, a coherent extrapolated volition
In calculating CEV, an AI would predict what an idealized version of us would want,
"if we knew more, thought faster, were more the people we wished we were, had grown up farther together". It would recursively iterate this prediction for humanity as a whole, and determine the desires which converge. This initial dynamic would be used to generate the
's utility function.
The main problems with CEV include, firstly, the great difficulty of implementing such a program -
“If one attempted to write an ordinary computer program using ordinary computer programming skills, the task would be a thousand lightyears beyond hopeless. ” Secondly, the possibility that human values may not converge. Yudkowsky considered CEV obsolete almost immediately after its publication in 2004. He states that there's a "principled distinction between discussing CEV as an initial dynamic of Friendliness, and discussing CEV as a Nice Place to Live" and his essay was essentially conflating the two definitions. (copied from
As an example of the classical concept of volition, the author develops a simple thought experiment: imagine you’re facing two boxes, A and B. One of these boxes, and only one, has a diamond in it – box B. You are now asked to make a guess, whether to
chosebox A or B, and you chose to open box A. It was your decision to take box A, but your volition was to choose box B, since you wanted the diamond in the first place.