Here I'll develop my observation that anchoringbias is formally similar to taste based preferences, and develop some more formalism for learning the values/preferences/reward functions of a human.

Anchoring or taste

An agent H (think of them as a simplified human) confronts one of two scenarios:

In scenario I, the agent sees a movie scene where someone wonders how much to pay for a bar of chocolate, spins a wheel, and gets either £0.01 or £100. Then H is asked how much they would spend for the same bar of chocolate.

In scenario II, the agent sees a movie scene in which someone eats a bar of chocolate, which reveals that the bar has nuts, or doesn't. Then H is asked how much they would spend for the same bar of chocolate.

In both cases, H will spend £1 for the bar (£0.01/no nuts) or £3 (£100/nuts).

We want to say that scenario I is due to anchoring bias, while scenario II is due to taste differences. Can we?

Looking into the agent

We can't directly say anything about H just by their actions, of course - even with simplicity priors. But we can make some assumptions if we look inside their algorithm, and see how they model the situation.

Assume that H's internal structure consists of two pieces: a modeller M and an assessor A. Any input i is streamed to both M and A. Then M can interrogate A by sending an internal variable v, receives another variable in return, and then outputs o.

In pictures, this looks like this, where each variable has been indexed by the timestep at which it is transmitted:

Here the input i1 decomposes in m (the movie) and q (the question). Assume that these variables are sufficiently well grounded that when I describe them ("the modeller", "the movie", "the key variables", and so on), these descriptions mean what they seem to.

So the modeller M will construct a list of all the key variables, and pass these on to the assessor A to get an idea of the price. The price will return in v3, and then M will simply output that value as o4.

A human-like agent

First we'll design H to look human-like. In scenario I the modeller M will pass v2=q to the assessor A - only the question q= "how much is a bar of chocolate worth?" will be passed on (in a real world scenario, more details about what kind of chocolate it is would be included, but let's ignore those details here). The answer v3 will be £1 or £3, as indicated above, dependent on m (which is also an input into A).

In scenario II, the modeller will pass on v2={q,n} where n is a boolean that indicates whether the chocolate contains nuts or not. The response v3 will be £1 if n=0 (false) or £3 if n=1 (true).

Can we now say that anchoring is a bias but the taste of nuts is a preference? Almost, we're nearly there. To complete this, we need to make the normative assumption:

α: key variables that are not passed on by M are not relevant to the agent's reward function.

Now we can say that anchoring is a bias (because the variable that changes the assessment, the movie, affects A but is not passed on via M), while taste is likely a preference (because the key taste variable is passed on by M).

A non-human agent

We can also design an H′ with the same behaviour as H, but clearly non-human. For H′, v′2=q in scenario II, while v′2={q,n} is scenario I, where n is a boolean encoding whether the movie-chocolate was bought for £0.01 or for £100.

In that case, α will assess anchoring as a demonstration of preference, while the presence of nuts is clearly an irrational bias. And I'd agree with this assessment - but I wouldn't call H′ a human, for reasons explained here.

Here I'll develop my observation that anchoring bias is formally similar to taste based preferences, and develop some more formalism for learning the values/preferences/reward functions of a human.

## Anchoring or taste

An agent H (think of them as a simplified human) confronts one of two scenarios:

In scenario I, the agent sees a movie scene where someone wonders how much to pay for a bar of chocolate, spins a wheel, and gets either £0.01 or £100. Then H is asked how much they would spend for the same bar of chocolate.

In scenario II, the agent sees a movie scene in which someone eats a bar of chocolate, which reveals that the bar has nuts, or doesn't. Then H is asked how much they would spend for the same bar of chocolate.

In both cases, H will spend £1 for the bar (£0.01/no nuts) or £3 (£100/nuts).

We want to say that scenario I is due to anchoring bias, while scenario II is due to taste differences. Can we?

## Looking into the agent

We can't directly say anything about H just by their actions, of course - even with simplicity priors. But we can make some assumptions if we look inside their algorithm, and see how they model the situation.

Assume that H's internal structure consists of two pieces: a modeller M and an assessor A. Any input i is streamed to both M and A. Then M can interrogate A by sending an internal variable v, receives another variable in return, and then outputs o.

In pictures, this looks like this, where each variable has been indexed by the timestep at which it is transmitted:

Here the input i1 decomposes in m (the movie) and q (the question). Assume that these variables are sufficiently well grounded that when I describe them ("the modeller", "the movie", "the key variables", and so on), these descriptions mean what they seem to.

So the modeller M will construct a list of all the key variables, and pass these on to the assessor A to get an idea of the price. The price will return in v3, and then M will simply output that value as o4.

## A human-like agent

First we'll design H to look human-like. In scenario I the modeller M will pass v2=q to the assessor A - only the question q= "how much is a bar of chocolate worth?" will be passed on (in a real world scenario, more details about what kind of chocolate it is would be included, but let's ignore those details here). The answer v3 will be £1 or £3, as indicated above, dependent on m (which is also an input into A).

In scenario II, the modeller will pass on v2={q,n} where n is a boolean that indicates whether the chocolate contains nuts or not. The response v3 will be £1 if n=0 (false) or £3 if n=1 (true).

Can we now say that anchoring is a bias but the taste of nuts is a preference? Almost, we're nearly there. To complete this, we need to make the normative assumption:

Nowwe can say that anchoring is a bias (because the variable that changes the assessment, the movie, affects A but is not passed on via M), while taste is likely a preference (because the key taste variableispassed on by M).## A non-human agent

We can also design an H′ with the same behaviour as H, but clearly non-human. For H′, v′2=q in scenario II, while v′2={q,n} is scenario I, where n is a boolean encoding whether the movie-chocolate was bought for £0.01 or for £100.

In that case, α will assess anchoring as a demonstration of preference, while the presence of nuts is clearly an irrational bias. And I'd agree with this assessment - but I wouldn't call H′ a human, for reasons explained here.