In a previous post, I showed how, given certain normative assumptions, one could distinguish agents H for whom anchoring was a bias, from those H′ for which it was a preference.
But agent H′ looks clearly ridiculous - how could anchoring be a bias, it makes no sense. And I agree with that assessment! H′'s preferences make no sense - if we think of it as a human.
This is another way in which I think we can extract human preferences: using the fact that human models of each other, and self-models, are all incredibly similar. Consider the following astounding statements:
Most people will agree with all those statements, to a large extent - including the "somebody" being talked about. But what is going on here? Have I not shown that you can't deduce preferences or rationality from behaviour? It's not like we've put the "somebody" in an FMRI scan to construct their internal model, so how do we know?
The thing is, that natural selection is lazy, and a) different humans use the same type of cognitive machinery to assess each other, and b) individual humans tend to use their own self-assessment machinery to assess other humans. Consequently, there tends to be large agreement between our own internal self-assessment models, our models of other people, other people's models of other people, and other people's self-assessment models of themselves:
This agreement is not perfect, by any means - I've mentioned that it varies from culture to culture, individual to individual, and even within the same individual. But even so, we can add the normative assumption:
That explains why I said that H was a human, while H′ was not: my model of what a human would prefer in those circumstances was correct for H but not for H′.
Note that this modelling is often carried out implicitly, through selecting the scenarios, and tweaking the formal model, so as to make the agent being assessed more human-like. With many variables to play with, it's easy to restrict to a set that seems to demonstrate human-like behaviour (for example, using almost-rationality assumptions for agents with small action spaces but not for agents with large ones).
There's nothing wrong with this approach, but it needs to be made clear that, when we are doing that, we are projecting our own assessments of human rationality on the agent; we not making "correct" choices as if we were dispassionately improving the hyperparameters of an image recognition program.