Top postsTop post
This post will be about AIs that “refine” their utility function over time, and how it might be possible to construct such systems without giving them undesirable properties. The discussion relates to corrigibility, value learning, and (to a lesser extent) wireheading. We (Joar Skalse and Justin Shovelain) have spent some...
This post was written for Convergence Analysis by Michael Aird, based on ideas from Justin Shovelain and with ongoing guidance from him. Throughout the post, “I” will refer to Michael, while “we” will refer to Michael and Justin or to Convergence as an organisation. Epistemic status: High confidence in the...