Steven K. (not to be confused with https://www.lesswrong.com/users/steve2152)
I would add "will relevant people expect AI to have extreme benefits, such as a significant percentage point reduction in other existential risk or a technological solution to aging"
Here's my prediction:
To the extent that it differs from others' predictions, probably the most important factor is that I think even if AGI is hard, there are a number of ways in which human civilization could become capable of doing almost arbitrarily hard things, like through human intelligence enhancement or sufficiently transformative narrow AI. I think that means the question is less about how hard AGI is and more about general futurism than most people think. It's moderately hard for me to imagine how business as usual could go on for the rest of the century, but who knows.
I meant to assume that away:
But we'll assume that her information stays the same while her utility function is being inferred, and she's not doing anything to get more; perhaps she's not in a position to.
In cases where you're not in a position to get more information about your utility function (e.g. because the humans you're interacting with don't know the answer), your behavior won't depend on whether or not you think it would be useful to have more information about your utility function, so someone observing your behavior can't infer the latter from the former.
Maybe practical cases aren't like this, but it seems to me like they'd only have to be like this with respect to at least one aspect of the utility function for it to be a problem.
Paul above seems to think it would be possible to reason from actual behavior to counterfactual behavior anyway, I guess because he's thinking in terms of modeling the agent as a physical system and not just as an agent, but I'm confused about that so I haven't responded and I don't claim he's wrong.