The many counterfactuals of counterfactual mugging

AI ALIGNMENT FORUM
AF

The many counterfactuals of counterfactual mugging — AI Alignment Forum

This post is roughly an explanation of my current understanding of what the correct solution to the counterfactual mugging problem might look like. This post is all philosophy, with no real math. The interesting part, is that even if we could perform the standard counterfactuals on what if I take a different action, and look at a counterfactual in which the coin flip went another way, we would still not be done, because we would not know the true probability of the coin.

The Problem: You are a deterministic agent who knows a bunch of facts about math. In particular, you know that $2^{2^{2^{2^{2}}}}$ starts with a 2 in base 10. $Ω$ comes up to you and shows you his source code. In $Ω$ 's source code, you see that $Ω$ first calculates the first digit of $2^{2^{2^{2^{2}}}} .$ If it is even, $Ω$ shows you this code and asks you for 10 dollars. If it is odd, then $Ω$ tries to predict what you would do if the digit is even, and pays you 9 dollars if and only if he predicts that you would have paid the 10 dollars.

One possible solution: So first, since I am a deterministic agent, I either pay the 10 dollars or I don't. Therefore, I have to do the standard decision theory counterfactuals. I compute counterfactually what happens if I pay the 10, and counterfactually what happens if I don't. (One of these two counterfactuals is the actual world.) Note that in these first two counterfactuals, $2^{2^{2^{2^{2}}}}$ still starts with a 2, so the only difference is that I lose 10 dollars if I pay the 10.

Observe that I cannot trust $Ω$ 's source code to tell me about the counterfactual world in which $2^{2^{2^{2^{2}}}}$ starts with an odd number. For all I know, maybe $Ω$ would not exist if $2^{2^{2^{2^{2}}}}$ didn't start with a 2. So first, I have to myself compute the counterfactual world in which $2^{2^{2^{2^{2}}}}$ starts with an odd number. If in this counterfactual, I am not playing this game at all, I probably should not pay the 10 dollars. Assume I perform this counterfactual, and I see a world in which $Ω$ is running the same code.

Within this counterfactual, $Ω$ is performing his own counterfactual. $Ω$ is computing what I do in the counterfactual in which $2^{2^{2^{2^{2}}}}$ starts with an even number. (This is in fact a counterfactual, because it is taking place within the counterfactual world in which $2^{2^{2^{2^{2}}}}$ starts with an odd number.)

Lets say that my counterfactual in which $2^{2^{2^{2^{2}}}}$ starts with an odd number, and $Ω$ 's counterfactual in which $2^{2^{2^{2^{2}}}}$ starts with an even number turn out to be inverses. Then, when $Ω$ performs this counterfactual, $Ω$ ends up looking at the real world.

We now have two different possible top level worlds. The EVEN world, in which the digit is even, and I am counterfacting to predict $Ω,$ and the ODD world, in which the digit is odd, and $Ω$ is counterfacting to predict me. In the EVEN world, we already performed the counterfactuals and observed that if I pay, I get -10, and if I don't, I get 0. In the ODD world, we can again preform the standard counterfactuals, and see that if I pay, I get 9, and if I don't, I get 0.

Depending on how you count, we have performed somewhere form 4 to 6 counterfactuals already, but we are not done. Even with a complete ability to analyze what the ODD world looks like, we still have to figure out how much we should care about the ODD world as a whole relative to the EVEN world.

We can't expect either the EVEN world or ODD would to tell us how much to care about each world. Relative to the EVEN world, the EVEN world is true, and relative to ODD world, the ODD world is true. We must take a 7th counterfactual that looks something like counterfacting on ourselves not knowing what $2^{2^{2^{2^{2}}}}$ starts with, and asking what probability we would assign to the EVEN world. Performing this counterfactual, we see that the probability that $2^{2^{2^{2^{2}}}}$ starts with an even number is 39.11% (From Benford's Law). Paying 10 dollars with probability 39.11% to gain 9 dollars with probability 60.89% is a good deal, so we should pay.