I recently became unstuck on counterfactuals. I now believe that counterfactuals are confusing, in large part, because they entail preserving our values through an ontological shift[1].

In our naive ontology[2], when we are faced with a decision, we conceive of ourselves as having free will in the sense of there being multiple choices that we could actually take[3]. These choices are conceived of as actual and we when think about the notion of the "best possible choice" we see ourselves as comparing actual possible ways that the world could be. However, we when start investigating the nature of the universe, we realise that it is essentially deterministic[4] and hence that our naive ontology doesn't make sense. This forces us to ask what it means to make the "best possible choice" in a deterministic ontology where we can't literally take a choice other than the one that we make.

This means that we have to try to find something in our new ontology that roughly maps to our old one. For example, CDT pretends that at the point of the decision that we magically make a decision other than that which we actually end up making. For example, we were intending to turn left, but at the instant of the decision we're magically altered to turn right instead. This works well for many scenarios, but notably fails for Newcomb-like problems. Here updateless counterfactuals are arguably a better fit since they allow us to model the existence of a perfect predictor. However, there is still too much uncertainty about how they should be constructed for this answer to be satisfactory.

Given that we're trying to map a notion[5] onto another ontology that doesn't natively support it, it isn't that surprising there isn't an obvious, nice, neat way of doing it.

Newcomb's problem exposes the tension between the following two factors:

a) wanting to ensure that the past is the same in each counterfactual to ensure that they are comparable with each other, in the sense of it being fair to compare the counterfactuals in order to evaluate the decision
b) maintaining certain consistency conditions that are present in the problem statement[6].

The tension is as follows: If we hold the past constant between counterfactuals then we've violated the consistency conditions as we have a point in time when the agent takes a decision despite the fact that this is inconsistent with the previous time, plus the laws of physics. However, if we backpropogate the impacts of the agent's choice through time to create a consistent counterfactual, it's unclear whether the two counterfactuals are comparable since the agent is facing different scenarios, in terms of how much money is in the box.

It turns out that Newcomb's Problems is more complicated than I previously realised. In the past, I thought I had conclusively demonstrated that we should one-box by refuting the claim that one-boxing only made sense if you accepted backwards causation. However, this now seems insufficient as I haven't explained why we should maintain the consistency conditions over comparability after making the ontological shift.

In the past, I might have said that these consistency conditions are what define the problem and that if we dropped them it would no longer be Newcomb's Problem. However, that seems to take us towards a model of counterfactuals being determined by social convention, which only seems useful as a descriptive model, not a prescriptive model. My current approach now tends to put more focus on the evolutionary[7] process that created the intuitions and instincts underlying these incompatible demands as I believe that this will help us figure out the best way to stitch them together.

In any case, understanding that there is an ontological shift here seems like an important part of the puzzle. I can't exactly say what the consequences are of this yet and maybe this post is just stating the obvious, but my general intuition is that the more we explicitly label the mental moves we are making, the less likely we are to trip ourselves up. In particular, I suspect that it'll allow us to make our arguments less handwavey than they otherwise would have been. I'll try to address the different ways that we could handle this shift in the future.

Thanks to Justis for providing feedback.




  1. ^

    If we knew how to handle ontological shifts we'd be able to handle this automatically, however it is likely easier to go the other way where we figure out how to handle counterfactuals as part of our investigations into how to handle ontological shifts.

  2. ^

    I mean the way in which we naturally interface with the world when we aren't thinking philosophically or scientifically.

  3. ^

    As opposed to "being able to take this choice" being just a convenient model of the world.

  4. ^

    Quantum mechanics only shifts us from the state of the world being deterministic, to the probability distribution being deterministic. It doesn't provide scope for free will, so it doesn't avoid the ontological shift.

  5. ^

    Our naive notion of having different actual choices as opposed to this merely being a useful model.

  6. ^

     Specifically, the consistency conditions are: that a) the action taken should match the prediction of the oracle and b) the box should contain the million if and only if the oracle predicted the agent would one-box c) the each moment of time should follow from the previous given the laws of physics.

  7. ^

    Evolutionary primarily in the sense of natural selection shaping our intuitions, but without ruling out societal influences.

New Comment
7 comments, sorted by Click to highlight new comments since: Today at 9:11 AM

I'm guessing a good way to think about free will under determinism is with logical time that's different from physical time. The points/models that advance in logical time are descriptions of environment with different amount of detail, so that you advance in logical time by filling in more details, and sometimes it's your decisions that are filled in (at all of your instances and predictions-of simultaneously). This is different from physical time, where you fill in details in a particular way determined by laws of physics.

The ingredient of this point of view that's usually missing is that concrete models of environment (individual points of states of knowledge) should be allowed to be partial, only specify some of the data about the environment. Then, actual development of models in response to decisions is easier to see, it's not inherently a kind of illusion borne of lack of omniscience. This is in contrast to the usual expectation that the only thing with partial details is the states of knowledge about complete models of environment (with all possible details already filled in), so that partiality is built on top of lack of partiality.

The filling-in of partial models with logical time probably needs to be value-laden. Most counterfactuals are fictional, and the legible details of decision relevant fiction should preserve its moral significance. So it's veering in the direction of "social convention", though in a normative way, in the sense that value is not up for grabs. On the other hand, it's a possible way of understanding CEV as a better UDT, instead of as a separate additional construction with its own desiderata (the simulations of possible civilizations from CEV reappear in decision theory as counterfactuals developing in logical time).

Determinism doesn't seem like a central example of ontological shift, and bargaining seems like the concept of dealing with more general ontological shifts. You bargain with your variant in a different ontological context for doing valuable things. This starts with extrapolation of value to that context, so that it's not beyond the goodhart boundary, you grow confident in legible proxy goals that talk about that territory. It also seems to be a better framing for updatelessness, as bargaining among possible future epistemic states, acausal trade among them, or at least those that join the coalition of abiding by the decision of the epistemic past. This way, considering varying possible future moral states (~partial probutility functions) is more natural. The motivation to do that is so that the assumption of unchanging preference is not baked in into the decision theory, and it gets a chance of modeling mild optimization.

AFAIK the best known way of reconciling physical causality with "free will" like choice is constructor theory, which someone pointed out was similar to my critical agential approach.

I commented directly on your post.

Note the preceding

Let's first, within a critical agential ontology, disprove some very basic forms of determinism.

I'm assuming use of a metaphysics in which you, the agent, can make choices. Without this metaphysics there isn't an obvious motivation for a theory of decisions. As in, you could score some actions, but then there isn't a sense in which you "can" choose one according to any criterion.

Maybe this metaphysics leads to contradictions. In the rest of the post I argue that it doesn't contradict belief in physical causality including as applied to the self.

As in, you could score some actions, but then there isn't a sense in which you "can" choose one according to any criterion.


I've noticed that issue as well. Counterfactuals are more a convenient model/story than something to be taken literally. You've grounded decision by taking counterfactuals to exist a priori. I ground them by noting that our desire to construct counterfactuals is ultimately based on evolved instincts and/or behaviours so these stories aren't just arbitrary stories but a way in which we can leverage the lessons that have been instilled in us by evolution. I'm curious, given this explanation, why do we still need choices to be actual?

Do you think of counterfactuals as a speedup on evolution? Could this be operationalized by designing AIs that quantilize on some animal population, therefore not being far from the population distribution, but still surviving/reproducing better than average?

Speedup on evolution?

Maybe? Might work okayish, but doubt the best solution is that speculative.