# 8

Frontpage

I often notice that in many (not all) discussions about utility functions, one side is "for" their relevance, while others tend to be "against" their usefulness, without explicitly saying what they mean. I don't think this is causing any deep confusions among researchers here, but I'd still like to take a stab at disambiguating some of this, if nothing else for my own sake. Here are some distinct (albeit related) ways that utility functions can come up in AI safety, in terms of what assumptions/hypotheses they give rise to:

AGI utility hypothesis: The first AGI will behave as if it is maximizing some utility function

ASI utility hypothesis: As AI capabilities improve well beyond human-level, it will behave more and more as if it is maximizing some utility function (or will have already reached that ideal earlier and stayed there)

Human utility hypothesis: Even though in some experimental contexts humans seem to not even be particularly goal-directed, utility functions are often a useful model of human preferences to use in AI safety research

Coherent Extrapolated Volition (CEV) hypothesis: For a given human H, there exists some utility function V such that if H is given the appropriate time/resources for reflection, H's values would converge to V