I have written extensively on Newcomb's problem, so I assumed that I must have written up a clear explanation for this at some point. However, this doesn't seem to be the case, so I suppose I'm doing it now. The TLDR is that only one counterfactual is factual and the rest are constructed. Since they're constructed, there's no real requirement for them to have the same past as the factual and, in fact, if we value consistency, then the most natural way to construct them will involve tweaking the past.

The Student and the Exam

I've covered this example in a past post, so feel free to skip to the next section if you know it.

Suppose a student has a test on Friday. They are considering whether to study or go to the beach. Since the universe deterministic, they conclude that the outcome is already fixed and has been fixed since the start of time. Therefore, they figure out that they may as well not bother to study. Is there anything wrong with this reasoning?

My answer is that there are two views we can take of the universe.

Raw Reality: From this perspective, we are only looking at the universe as it is in its raw form; that is the territory component of the map and territory. This means that the outcome is fixed, but that the choice of whether or not to study is fixed as well. Or to be clear, the outcome is fixed in part due to the student's decision being fixed; and not independently of it. From within this view, we can't construct anything other than a Trivial Decision Theory problem as there's only a single choice we can take.

(Update: Eliezer seems to adopt a similar view here).

Augmented Reality: This perspective is created by constructing counterfactuals. It is only from within this perspective that we can talk about the student having a choice. The fact that the student necessarily obtains a particular outcome places no limitation on the counterfactuals - which are by definition not factual and were expressly created for the purpose of considering the universe travelling down a different path.

Newcomb's Problem

One of the most confusing aspects of Newcomb's Problem, is that the one-boxing solution seems to depend on backwards causation. It seems to rely on the reasoning that if I made a different choice, this would cause Omega to make a different prediction. I'll assume determinism as if we have a free will independent of determinism, Omega wouldn't be able to be a perfect predictor. (Similarly, quantum physics doesn't make much of a difference as it simply changes the universe from deterministic to probabilistically deterministic).

I've argued that backwards causation isn't necessarily absurd, but nonetheless, I still want to demonstrate that 1-boxing doesn't require backwards causation so that we can completely avoid this controversy. Further, it is possible to undermine the concept of causality itself, but I'll ignore this, as if you do this, then there won't be any problem for me to solve.

We'll do this by making a similar move to the one we made in The Student and The Exam. However, instead of suggesting that counterfactuals can have different futures from the factual, we'll suggest that they can have different pasts.

The first point I'll note that the mere fact of something being a particular way in the factual doesn't mean that it has to be that way in the counterfactual. If we didn't make any changes, then it wouldn't actually be a counterfactual, but beyond this there'd be no point.

So we're allowed to make changes, but why are we specifically allowed to edit the past? Well, when we edit the decision an agent makes, only projecting that decision forwards in time results in an inconsistent counterfactual. For example: the agent is the kind of agent that will go left, up until the moment of the decision when they magically decide to go right instead. Editing the past to make a counterfactual consistent, is an entirely natural thing to do. After all, we might very well doubt the value in considering a counterfactual that isn't even consistent with our laws of physics! The vague intuition behind counterfactuals is for them to be something that "could have happened" - I suspect that I'm not alone in wanting to protest that an inconsistent counterfactual couldn't have happened!

I'll acknowledge that the previous two arguments contain involve gesturing at an unclear notion of what counterfactual are. This is inevitable as I don't yet have a complete theory of counterfactuals and even if I did, including it would greatly complicate this post. Nonetheless, I think they should be sufficient to persuade most people that constructing counterfactuals with different pasts is completely reasonable.

Nonetheless, some people may wonder why the process I've described doesn't involve backwards causation. After all, it involved editing a decision and then backpropagating. The key point is that this backwards propagation occurs during the process for constructing the counterfactual and that it is not necessarily a fact about the internal structure of the counterfactual.

That is, within the actual counterfactual causality operates normally. If you are using an entropic arrow for determining the direction of time, it will in most cases be pointing the same way in both the counterfactual and the factual. If you think of causality as the existence of laws which order the universe temporally (the mere fact of X being the case forcing some Y to also be the case later in time), then the counterfactual will also have these laws. Our model of the counterfactual is just like our model of the factual with a few tweaks!

But beyond this, even if you think causality is about the relationship between the counterfactuals, you can't claim backwards causation by merely referring to the process used to generate the counterfactuals rather than the actual counterfactuals themselves. And the mere fact that the counterfactuals have different pasts, isn't sufficient to count as backwards causation. Claiming this would collapse two distinct concepts into one. But if we did define backwards causation in this way, this whole issue would become something of a nothing-burger as we could just concede its existence and then say, "So what?".


The claim "if I made a different choice, then the past would be different" is misleading. From the view of raw reality, it's important to understand that you can only make the choice you made and the past can only be as it was. From the view of augmented reality, the claim merely becomes "if we look at a version of me that would have made a different choice, then it will be located in a counterfactual with a different past".


  • The Prediction Problem: My first attempt at explaining Newcomb's problem - it did okay, but I'm kind of embarrassed about it now. I see the Prediction Problem as a useful intuition pump for why you should take into meta-theoretical uncertainty, but I don't see it as having anything to say about which decision theory is best in and of itself.
  • Deconfusing Logical Counterfactuals: Another somewhat outdated post of mine since it defends an erasure model of counterfactuals that I no longer endorse. Nonetheless, I still think that it contains some interesting ideas about Newcomb-like problems.
New Comment
3 comments, sorted by Click to highlight new comments since: Today at 1:52 PM

I'm confused... What you call the "Pure Reality" view seems to work just fine, no? (I think you had a different name for it, pure counterfactuals or something.) What do you need counterfactuals/Augmented Reality for? Presumably making decisions thanks to "having a choice" in this framework, right? In the pure reality framework the "student and the test" example one would dispassionately calculate what kind of a student algorithm passes the test, without talking about making a decision to study or not to study. Same with the Newcomb's, of course, one just looks at what kind of agents end up with a given payoff. So... why pick an AR view over the PR view, what's the benefit?

Excellent question. Maybe I haven't framed this well enough.

We need a way of talking about the fact that both your outcome and your action are fixed by the past.

We also need a way of talking about the fact that we can augment the world with counterfactuals (Of course, since we don't have complete knowledge of the world, we typically won't know which is the factual and which are the counterfactuals).

And that these are two distinct ways of looking at the world.

I'll try to think about a cleaner way of framing this, but do you have any suggestions?

(For the record, the term I used before was Raw Counterfactuals - meaning consistent counterfactuals - and that's a different concept than looking at the world in a particular way).

(Something that might help is that if we are looking at multiple possible pure realities, then we've introduced counterfactuals as only one is true and "possible" is determined by the map rather than the territory)

I think the best way to explain this is to imagine characterise the two views as slightly different functions both of which return sets. Of course, the exact type representations isn't the point. Instead, the types are just there to illustrate the difference between two slightly different concepts.

possible_world_pure() returns {x} where x is either <study & pass> or <beach & fail>, but we don't know which one it will be

possible_world_augmented() returns {<study & pass>, <beach & fail>}

Once we've defined possible worlds, it naturally provides us a definition of possible actions and possible outcomes that matches what we expect. So for example:

size(possible_world_pure()) = size(possible_action_pure()) = size(possible_outcome_pure()) = 1

size(possible_world_augmented()) = size(possible_action_augmented()) = size(possible_outcome_augmented()) = 2

And if we have a decide function that iterates over all the counterfactuals in the set and returns the highest one, we need to call it on possible_world_augmented() rather than possible_world_pure().

Note that they aren't always this similar. For example, for Transparent Newcomb they are:

possible_world_pure() returns {<1-box, million>}

possible_world_augmented() returns {<1-box, million>, <2-box, thousand>}

The point is that if we remain conscious of the type differences then we can avoid certain errors.

For example possible_outcome_pure() = {"PASS"}, doesn't mean that possible_outcome_augmented() = {"PASS"}. It's that later which would imply it doesn't matter what the student does, not the former.