Dutch-Booking CDT

Caspar Oesterheld is working on similar ideas.

For anyone who's interested, Abram here refers to my work with Vincent Conitzer which we write about here.

ETA: This work has now been published in The Philosophical Quarterly.

(lightly edited restatement of email comment)

Let's see what happens when we adapt this to the canonical instance of "no, really, counterfactuals aren't conditionals and should have different probabilities". The cosmic ray problem, where the agent has the choice between two paths, it slightly prefers taking the left path, but its conditional on taking the right path is a tiny slice of probability mass that's mostly composed of stuff like "I took the suboptimal action because I got hit by a cosmic ray".

There will be 0 utility for taking left path, -10 utility for taking the right path, and -1000 utility for a cosmic ray hit. The CDT counterfactual says 0 utility for taking left path, -10 utility for taking the right path, while the conditional says 0 utility for left path, -1010 utility for right path (because conditional on taking the right path, you were hit by a cosmic ray).

In order to get the dutch book to go through, we need to get the agent to take the right path, to exploit P(cosmic ray) changing between the decision time and afterwards. So the initial bet could be something like -1 utility now, +12 utility upon taking the right path and not being hit by a cosmic ray. But now since the optimal action is "take the right path along with the bet", the problem setup has been changed, and we can't conclude that the agent's conditional on taking the right path places high probability on getting hit by a cosmic ray (because now the right path is the optimal action), so we can't money-pump with the "+0.5 utility, -12 utility upon taking a cosmic ray hit" bet.

So this seems to dutch-book Death-in-Damascus, not CDT $\neq$ EDT cases in general.

[-]Vivek Hebbar1y10

If the agent follows EDT, it seems like you are giving it epistemically unsound credences. In particular, the premise is that it's very confident it will go left, and the consequence is that it in fact goes right. This was the world model's fault, not EDT's fault. (It is notable though that EDT introduces this loopiness into the world model's job.)

[-]abramdemski7y10

(lightly edited version of my original email reply to above comment; note that Diffractor was originally replying to a version of the Dutch-book which didn't yet call out the fact that it required an assumption of nonzero probability on actions.)

I agree that this Dutch-book argument won't touch probability zero actions, but my thinking is that it really should apply in general to actions whose probability is bounded away from zero (in some fairly broad setting). I'm happy to require an epsilon-exploration assumption to get the conclusion.

Your thought experiment raises the issue of how to ensure in general that adding bets to a decision problem doesn't change the decisions made. One thought I had was to make the bets always smaller than the difference in utilities. Perhaps smaller Dutch-books are in some sense less concerning, but as long as they don't vanish to infinitesimal, seems legit. A bet that's desirable at one scale is desirable at another. But scaling down bets may not suffice in general. Perhaps a bet-balancing scheme to ensure that nothing changes the comparative desirability of actions as the decision is made?

For your cosmic ray problem, what about:

You didn't specify the probability of a cosmic ray. I suppose it should have probability higher than the probability of exploration. Let's say 1/million for cosmic ray, 1/billion for exploration.

Before the agent makes the decision, it can be given the option to lose .01 util if it goes right, in exchange for +.02 utils if it goes right & cosmic ray. This will be accepted (by either a CDT agent or EDT agent), because it is worth approximately +.01 util conditioned on going right, since cosmic ray is almost certain in that case.

Then, while making the decision, cosmic ray conditioned on going right looks very unlikely in terms of CDT's causal expectations. We give the agent the option of getting .001 util if it goes right, if it also agrees to lose .02 conditioned on going right & cosmic ray.

CDT agrees to both bets, and so loses money upon going right.

Ah, that's not a very good money pump. I want it to lose money no matter what. Let's try again:

Before decision: option to lose 1 millionth of a util in exchange for 2 utils if right&ray.

During decision: option to gain .1 millionth util in exchange for -2 util if right&ray.

That should do it. CDT loses .9 millionth of a util, with nothing gained. And the trick is almost the same as my dutch book for death in damascus. I think this should generalize well.

The amounts of money lost in the Dutch Book get very small, but that's fine.

[-]ESRogs7y10

I conclude from this that CDT should equal EDT (hence, causality must account for logical correlations, IE include logical causality). By "CDT" I really mean any approach at all to counterfactual reasoning; counterfactual expectations should equal evidential expectations.

As with most of my CDT=EDT arguments, this only provides an argument that the expectations should be equal for actions taken with nonzero probability. In fact, the amount lost to Dutch Book will be proportional to the probability of the action in question. So, differing counterfactual and evidential expectations are smoothly more and more tenable as actions become less and less probable.

I'm having a little trouble following the terminology here (despite the disclaimer).

One particular thing that confuses me -- you say, "the expectations should be equal for actions taken with nonzero probability" and also "differing counterfactual and evidential expectations are smoothly more and more tenable as actions become less and less probable", but I'm having trouble understanding how they could both be true. How does, "they're equal for nonzero probability" match with "they move further and further apart the closer the probability gets to zero"? (Or are those incorrect paraphrases?)

It seems to me that if you have two functions that are equal whenever then input (the probability of an action) is nonzero, then they can't also get closer and closer together as the input increases from zero -- they're already equal as soon as the input does not equal zero! I assume that I have misunderstood something, but I'm not sure which part.

[-]abramdemski7y20

"The expectations should be equal for actions with nonzero probability" -- this means a CDT agent should have equal causal expectations for any action taken with nonzero probability, and EDT agents should similarly have equal evidential expectations. Actually, I should revise my statement to be more careful: in the case of epsilon-exploring agents, the condition is >epsilon rather than >0. In any case, my statement there isn't about evidential and causal expectations being equal to each other, but rather about one of them being conversant across (sufficiently probable) actions.

"differing counterfactual and evidential expectations are smoothly more and more tenable as actions become less and less probable" -- this means that the amount we can take from a CDT agent through a Dutch Book, for an action which is given a different casual expectation than evidential expectation, smoothly reduces as the probability of an action goes to zero. In that statement, I was assuming you hold the difference between evidential and causal expectations constant add you reduce the probability of the action. Otherwise it's not necessarily true.

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

12

12