# 5

In this post, I hope to persuade you of what I consider to be an important principle when dealing with decision theory and counterfactuals.

Joe Carlsmith describes the Yankees vs. Red Sox Problem as below:

In this case, the Yankees win 90% of games, and you face a choice between the following bets:

Yankees win                Red Sox win

You bet on Yankees             1                                 -2

You bet on Red Sox            -1                                 2

Or, if we think of the outcomes here as “you win your” and “you lose your bet” instead, we get:

You bet on Yankees             1                                 -2

You bet on Red Sox             2                                 -1

Before you choose your bet, an Oracle tells you whether you’re going to win your next bet. The issue is that once you condition on winning or losing (regardless of which), you should always bet on the Red Sox. So, the thought goes, EDT always bets on the Red Sox, and loses money 90% of the time. Betting on the Yankees every time does much better.

The mistake is assuming that here is assuming that the Oracle's prediction applies in counterfactuals (worlds that don't occur) in addition to the factual (the world that does).

If the Oracle knows both:

a) The Yankees will win
b) You will bet on the Yankees, if you are told that you will win your next bet

Then the Oracle knows that it can safely predict you winning your bet, without any possiblity of this prediction being wrong.

Notice that the Oracle doesn't need to know anything about the counterfactual where you bet Red Sox, except that the Yankees will win (and maybe not even that).

After all, Condition b) only applies when you are told that you will win your next bet. If you would have bet on Red Sox instead after being told that you were going to win, then the Oracle wouldn't have promised that you were going to choose correctly.

In fact, the Oracle mightn't have been able to publicly make a consistent prediction at all, as learning of its prediction might change your actions. This would be the case if all of the following three conditions held at once:

a) Yankees were going to win
b) If you were told that you'd win your next bet, you'd bet Red Sox
c) If you were told that you'd lose your next bet, you'd bet Yankees

The only way the Oracle would be able to avoid being mistaken, would be to not make any prediction at all. This example clearly demonstrates how an Oracle's predictions can be limited by your betting tendencies.

To be clear, if the Oracle tells you that you are going to win, you can't interpret this as applying completely unconditionally. Instead, you have to allow that the Oracle's prediction may be contingent on how you bet (in many problems the Oracle actually knows how you will bet and will use this in its prediction).

The Oracle's prediction only has to apply to the world that is. It doesn't have to apply to worlds that are not.

Why not go with Joe's argument?

Joe argues that the Oracle's prediction renders your decision-making unstable. While this is "fishy", it's not clear to me that this is a mistake. After all, maybe the Oracle knows how many times you'll switch back and forth between the two teams before making your final decision? Maybe this doesn't answer Joe's objection, plausibly it does.

Are there Wider Implications?

If it is conceded that that the Oracle's prediction can vary in counterfactuals, then this would undermine the argument for 2-boxing in Newcomb's Problem which relies on the Oracle's prediction being constant across all counterfactuals. I suppose someone could argue that this problem only demonstrates that the prediction can vary in counterfactuals when the prediction is publicly shared. But even if I haven't specifically shown that non-public predictions can vary across counterfactuals, I've still successfully undermined the notion that the past is fixed across counterfactuals.

This result is damaging to both EDT and CDT which both take the Oracle's prediction to apply across all counterfactuals.

I also suspect that anyone who finds this line of argument persuasive will end up being more persuaded by my explaination for why 1 boxing doesn't necessarily imply backwards causation (short answer: because counterfactuals are a construction and constructing at least part of a counterfactual backwards is different from asserting that the internal structure of that counterfactual involves backwards causation). However, I can't explain why it's related in any clear fashion.

Update:

Vladamir Nesov suggested that the principle should be "The Oracle's prediction only has to apply to the world where the prediction is delivered". My point was that Oracle predictions made in the factual don't apply to counterfactuals, but I prefer his way of framing things as it is more general.

# 5

New Comment

Consider the variant where the Oracle demands a fee of 100 utilons after delivering the prediction, which you can't refuse. Then the winning strategy is going to be about ensuring that the current situation is counterfactual, so that in actuality you won't have to pay the Oracle's fee, because the Oracle wouldn't be able to deliver a correct prediction.

The Oracle's prediction only has to apply to the world that is. It doesn't have to apply to worlds that are not.

The Oracle's prediction only has to apply to the world where the prediction is delivered. It doesn't have to apply to the other worlds. The world where the prediction is delivered can be the world that is not, and another world can be the world that is.

"The Oracle's prediction only has to apply to the world where the prediction is delivered" - My point was that predictions that are delivered in the factual don't apply to counterfactuals, but the way you've framed it is better as it handles a more general set of cases. It seems like we're on the same page.

It's not actually more general, it's instead about a somewhat different point. The more general statement could use some sort of a notion of relative actuality, to point at the possibly counterfactual world determined by the decision made in the world where the prediction was delivered, which is distinct from the even more counterfactual worlds where the prediction was delivered but the decision was different from what it would relative-actually be had the prediction been delivered, and from the worlds where the prediction was not delivered at all.

If the prediction is not actually delivered, then it only applies to that intermediately-counterfactual world and not to the more counterfactual alternatives where the prediction was still delivered or to the less counterfactual situation where the prediction is not delivered. Saying that the prediction applies to the world where it's delivered is liable to be interpreted as including the more-counterfactual worlds, but it doesn't have to apply there, it only applies to the relatively-actual world. So your original framing has a necessary part of saying this carefully that my framing didn't include, replacing it with my framing discards this correct detail. The Oracle's prediction only has to apply to the "relatively-actual" world where the prediction is delivered.

Small insight why reading this: I'm starting to suspect that most (all???) unintuitive things that happen with Oracles are the result of them violating our intuitions about causality because they actually deliver no information, in that nothing can be conditioned on what the Oracle says because if we could then the Oracle would fail to actually be an Oracle, so we can only condition on the existence of the Oracle and how it functions and not what it actually says, e.g. you should still 1-box but it's mistaken to think anything an Oracle tells you allows you to do anything different.

Yeah, you want either information about the available counterfactuals or information independent of your decision. Information about just the path taken isn't something you can condition on.

When the Oracle says "The taxi will arrive in one minute!", you may as well grab your coat.

Isn't that prediction independent of your decision to grab your coat or not?

The prediction is why you grab your coat, it's both meaningful and useful to you, a simple counterexample to the sentiment that since correctness scope of predictions is unclear, they are no good. The prediction is not about the coat, but that dependence wasn't mentioned in the arguments against usefulness of predictions above.

Sure, that's a sane Oracle.  The Weird Oracle used in so many thought experiments doesn't say ""The taxi will arrive in one minute!", it says "You will grab your coat in time for the taxi.".

No, this is an important point: the agent normally doesn't know the correctness scope of the Oracle's prediction. It's only guaranteed to be correct on the actual decision, and can be incorrect in all other counterfactuals. So if the agent knows the boundaries of the correctness scope, they may play chicken and render the Oracle wrong by enacting the counterfactual where the prediction is false. And if the agent doesn't know the boundaries of the prediction's correctness, how are they to make use of it in evaluating counterfactuals?

It seems that the way to reason about this is to stipulate correctness of the prediction in all counterfactuals, even though it's not necessarily correct in all counterfactuals, in the same way as the agent's decision that is being considered is stipulated to be different in different counterfactuals, even though the algorithm forces it to be the same. So it's a good generalization of the problem of formulating counterfactuals, it moves the intervention point from agent's own decisions to correctness of powerful predictors' claims. These claims act on the counterfactuals generated by the agent's own decisions, not on the counterfactuals generated by delivery of possible claims, so it's not about merely treating predictors as agents, it's a novel setup.

Is there an ELI5 doc about what's "normal" for Oracles, and why they're constrained in that way?  The examples I see confuse me in that they are exploring what seem like edge cases, and I'm missing the underlying model that makes these cases critical.

Specifically, when you say "It's only guaranteed to be correct on the actual decision", why does the agent not know what "correct" means for the decision?

Specifically, when you say "It's only guaranteed to be correct on the actual decision", why does the agent not know what "correct" means for the decision?

The agent knows what "correct" means, correctness of a claim is defined for the possible worlds that the agent is considering while making its decision (which by local tradition we confusingly collectively call "counterfactuals", even though one of them is generated by the actual decision and isn't contrary to any fact).

In the post Chris_Leong draws attention to the point that since the Oracle knows which possible world is actual, there is nothing forcing its prediction to be correct on the other possible worlds that the agent foolishly considers, not knowing that they are contrary to fact. And my point in this thread is that despite the uncertainty it seems like we have to magically stipulate correctness of the Oracle on all possible worlds in the same way that we already magically stipulate the possibility of making different decisions in different possible worlds, and this analogy might cast some light on the nature of this magic.

That's an interesting point. I suppose it might be viable to acknowledge that the problem taken literally doesn't require the prediction to be correct outside of the factual, but nonetheless claim that we should resolve the vagueness inherent in the question about what exactly the counterfactual is by constructing it to meet this condition. I wouldn't necessarily be strongly against this - my issue is confusion about what an Oracle's prediction necessarily entails.

Regarding, your notion about things being magically stipulated, I suppose there's some possible resemblance there with the ideas I proposed before in Counterfactuals As A Matter of Social Convention, although The Nature of Counterfactuals describes where my views have shifted to since then.

Hmm.  So does this only apply to CDT agents, who foolishly believe that their decision is not subject to predictions?

No, I suspect it's a correct ingredient of counterfactuals, one I didn't see discussed before, not an error restricted to a particular decision theory. There is no contradiction in considering each of the counterfactuals as having a given possible decision made by the agent and satisfying the Oracle's prediction, as the agent doesn't know that it won't make this exact decision. And if it does make this exact decision, the prediction is going to be correct, just like the possible decision indexing the counterfactual is going to be the decision actually taken. Most decision theories allow explicitly considering different possible decisions, and adding correctness of the Oracle's prediction into the mix doesn't seem fundamentally different in any way, it's similarly sketchy.

Thanks for patience with this. I am still missing some fundamental assumption or framing about why this is non-obvious (IMO, either the Oracle is wrong, or the choice is illusory).  I'll continue to examine the discussions and examples in hopes that it will click.

I presume Vladimir and me are likely discussing this from within the determinist paradigm in which "either the Oracle is wrong, or the choice is illusory" doesn't apply (although I propose a similar idea in Why 1-boxing doesn't imply backwards causation).

IMO, either the Oracle is wrong, or the choice is illusory

This is similar to determinism vs. free will, and suggests the following example. The Oracle proclaims: "The world will follow the laws of physics!". But in the counterfactual where an agent takes a decision that won't actually be taken, the fact of taking that counterfactual decision contradicts the agent's cognition following the laws of physics. Yet we want to think about the world within the counterfactual as if the laws of physics are followed.