The many counterfactuals of counterfactual mugging

by Scott Garrabrant2 min read12th Apr 2016No comments

2

Counterfactuals
Personal Blog

This post is roughly an explanation of my current understanding of what the correct solution to the counterfactual mugging problem might look like. This post is all philosophy, with no real math. The interesting part, is that even if we could perform the standard counterfactuals on what if I take a different action, and look at a counterfactual in which the coin flip went another way, we would still not be done, because we would not know the true probability of the coin.

The Problem: You are a deterministic agent who knows a bunch of facts about math. In particular, you know that starts with a 2 in base 10. comes up to you and shows you his source code. In 's source code, you see that first calculates the first digit of If it is even, shows you this code and asks you for 10 dollars. If it is odd, then tries to predict what you would do if the digit is even, and pays you 9 dollars if and only if he predicts that you would have paid the 10 dollars.

One possible solution: So first, since I am a deterministic agent, I either pay the 10 dollars or I don't. Therefore, I have to do the standard decision theory counterfactuals. I compute counterfactually what happens if I pay the 10, and counterfactually what happens if I don't. (One of these two counterfactuals is the actual world.) Note that in these first two counterfactuals, still starts with a 2, so the only difference is that I lose 10 dollars if I pay the 10.

Observe that I cannot trust 's source code to tell me about the counterfactual world in which starts with an odd number. For all I know, maybe would not exist if didn't start with a 2. So first, I have to myself compute the counterfactual world in which starts with an odd number. If in this counterfactual, I am not playing this game at all, I probably should not pay the 10 dollars. Assume I perform this counterfactual, and I see a world in which is running the same code.

Within this counterfactual, is performing his own counterfactual. is computing what I do in the counterfactual in which starts with an even number. (This is in fact a counterfactual, because it is taking place within the counterfactual world in which starts with an odd number.)

Lets say that my counterfactual in which starts with an odd number, and 's counterfactual in which starts with an even number turn out to be inverses. Then, when performs this counterfactual, ends up looking at the real world.

We now have two different possible top level worlds. The EVEN world, in which the digit is even, and I am counterfacting to predict and the ODD world, in which the digit is odd, and is counterfacting to predict me. In the EVEN world, we already performed the counterfactuals and observed that if I pay, I get -10, and if I don't, I get 0. In the ODD world, we can again preform the standard counterfactuals, and see that if I pay, I get 9, and if I don't, I get 0.

Depending on how you count, we have performed somewhere form 4 to 6 counterfactuals already, but we are not done. Even with a complete ability to analyze what the ODD world looks like, we still have to figure out how much we should care about the ODD world as a whole relative to the EVEN world.

We can't expect either the EVEN world or ODD would to tell us how much to care about each world. Relative to the EVEN world, the EVEN world is true, and relative to ODD world, the ODD world is true. We must take a 7th counterfactual that looks something like counterfacting on ourselves not knowing what starts with, and asking what probability we would assign to the EVEN world. Performing this counterfactual, we see that the probability that starts with an even number is 39.11% (From Benford's Law). Paying 10 dollars with probability 39.11% to gain 9 dollars with probability 60.89% is a good deal, so we should pay.

Counterfactuals1
Personal Blog

2

2 comments, sorted by Highlighting new comments since Today at 11:36 PM
New Comment

Counterfactual mugging with a logical coin is a tricky problem. It might be easier to describe the problem with a "physical" coin first. We have two world programs, mutually quined:

  1. The agent decides whether to pay the predictor 10 dollars. The predictor doesn't decide anything.

  2. The agent doesn't decide anything. The predictor decides whether to pay the agent 100 dollars, depending on the agent's decision in world 1.

By fiat, the agent cares about the two worlds equally, i.e. it maximizes the total sum of money it receives in both worlds. The usual UDT-ish solution can be crisply formulated in modal logic, PA or a bunch of other formalisms.

Does that make sense?

This makes sense. My main point is that the care about the two worlds equally part makes sense if it is part of the problem description, but otherwise we don't know where that part comes from.

My logical example was supposed to illustrate that sometimes you should not care about them equally.