The Nature of Counterfactuals

AI ALIGNMENT FORUM
AF

The Nature of Counterfactuals — AI Alignment Forum

I'm finally beginning to feel that I have a clear idea of the true nature of counterfactuals. In this post I'll argue that counterfactuals are just intrinsicly a part of how we make sense of the world. However, it would be inaccurate to present them as purely a human invention as we were shaped by evolution in such a way as to ground these conceptions in reality.

Unless you're David Lewis, you're probably going to be rather dubious of the claim that all possibilities exist (ie. that counterfactuals are ontologically real). Instead, you'll probably be willing to concede that they're something we construct; that they're in the map rather than in the territory.

Things in the map are tools, they are constructed because they are useful. In other words, they are constructed for a purpose or a number of purposes. So what is the purpose (or the purposes) of counterfactuals?

I first raised this question in Counterfactuals are an Answer, Not a Question and I struggled with it for around a year. Eventually, I realised that a big part of the challenge is just how abstract the question is. So I replaced it with something more concrete: "Why don't agents construct crazy counterfactuals?" One example would be expecting the world to explode if I made this post. Another would be filling in the future with randomly generated events? What shouldn't I do either of these?

I'll make a modest claim: it's not about aesthetics. We don't construct counterfactuals because we want them to be pretty or funny or entertaining. We want them to be useful. The reason why we don't just construct counterfactuals in a silly or arbitrary manner because we believe in some vague sense that it'd lead outcomes that are sub-optimal or that in expectation it'll lead to sub-optimal outcomes.

I suspect most people will agree that the answer must be something along these lines, but I've hardly defined it very precisely. So let's attempt to clarify. To keep this discussion as general as possible, note that we could have stated similar sentiments in terms of achieving good outcomes, avoiding bad outcomes, achieving better outcomes. But regardless of how we word it, we're comparing worlds and deciding that one is better than another. It's not about just considering one world and comparing it to a standard because we can't produce such a standard without constructing a world non-identical to the first.

Essentially, we conceive of certain worlds being possible, then we consider the expected value or the median outcome or some other metric over these worlds and finally we suggest that according to this metric the agents constructing a sane theory of counterfactuals tend to do better than the agents with crazy theories.

This naturally leads to another question: what worlds should we conceive of as being possible? Again, we can make this concrete by asking what would happen if we were to choose a crazy set of possible worlds - say a world just like this one and then a world with unicorns and fountains of gold - and no other worlds. Well again, the reason why we wouldn't do this is because we'd expect an agent building its decision theory based on these possible worlds to perform poorly.

What do I mean by poorly? Well, again it seems like we're conceiving of certain worlds as possible, imagining how agents constructing their decision theory based on different notions of possibility perform in these worlds and utilising some kind of metric to evaluate performance.

So we're back we're we were before. That is, we're going around in circles. Suppose an agent that believes we should consider set W of worlds as possible and construct a decision theory based on this. Then this agent will evaluate agents who adopt W in order to develop their decision theory as making an optimal decision and they will evaluate agents who adopt a different set of worlds that leads to a different decision theory as making a sub-optimal decision, except for in the rare cases where this doesn't make a difference. In other words, such an agent will reaffirm what it already believes about what worlds are possible.

You might think that the circularity is a problem, but circular epistemology turns out to be viable (see Eliezer's Where Recursive Justification Hits Bottom). And while circular reasoning is less than ideal, if the comparative is eventually hitting a point where we can provide no justification at all, then circular justification might not seem so bad after all.

Kant theorised that certain aspects of phenomenon were the result of intrinsic ways of how we interpret the world and that it is impossible for us to step outside this perspective. He called this Transcendental Idealism and suggested that it provided a form of a priori synthetic knowledge which provided the basic assumptions we needed to begin reasoning about the world (such as causation).

My approach is slightly different as I'm using circular epistemology rather than a priori synthetic knowledge to provide a starting place for reason. By having our starting claims amenable to updating based on evidence, I avoid a particular problem in the Kantian approach that is best highlighted by Einstein's Theory of Relativity. Namely, Kant claimed that space and time existed a priori, but experimental results were able to convince us otherwise, which should not be possible with an a priori result.

However, I agree with him that certain basic concept are frames that we impose on the world due to our cognitive structure (in my case I'm focusing on the notion of possibility). I'm not picturing this as a straightjacket that is completely impossible to escape; indeed these assumptions may be subsumed by something similar as they were in the case of relativity. The point is more that to even begin reasoning we have to begin within a cognitive frame.

Imagine trying to do physics without being able to say things like, "Imagine we have a 1kg frictionless ball...", mathematics without being able to entertain the truth of a proposition that may be false or divide a problem into cases and philosophy without being allowed to do thought experiments. Counterfactuals are such a basic concept that it makes sense to believe that they - or something very much like them - are a primitive.

Another aspect that adds to the merits of this theory - it is simple enough to be plausible (this seems to me like the kind of thing that should have a simple answer), yet also complicated enough to explain why it has been surprisingly difficult to progress on.

After writing this post I found myself in a strange position. I felt certain I had dramatically improved my conceptual understanding of counterfactuals, yet at the same time I found myself struggling to understand where to go from here in order to produce a concrete theory of counterfactuals or even having trouble to articulate how it helps in this regard at all.

A big part of the challenge for me is that I have almost no idea of how we should handle circular epistemology in the general case. There are far too many different strategies you could attempt to produce something consistent. I hope to have more clarity on this in the future.

Note: I posted some additional speculation in a shortform post. I decided to separate it out as I don't feel it's as high quality as the core of the post.

Links:

The lack of performance metrics for CDT versus EDT, ect. - Caspar Oesterheld - This article suggests that there might be no performance metric for comparing decision theories as it may potentially be decision theory complete which I see as very similar to the claim that decision theories are circularly justified.

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

4

4

Links: