What makes counterfactuals comparable?

[-]VojtaKovarik6y30

My impression is that logical counterfactuals, and counterfactuals, and comparability is - at the moment - too confused, and most disagreements here are "merely verbal" ones. Most of your questions (seem to me to) point in the direction of different people using different definitions. I feel slightly worried about going too deep into discussions along the lines of "Vojta reacts to Chris' claims about what other LW people argue against hypothetical 1-boxing CDT researchers from classical academia that they haven't met" :D.

My take on how to do counterfactuals correctly is that this is not a property of the world, but of your mental models:

Definition (comparability according to Vojta): Two scenarios are comparable (given model $M$ and observation sequence $o$ ) if they are both possible in $M$ and and consistent with $o$ .

According to this view, counterfactuals only make sense if your model contains uncertainty...

(Aside on logical counterfactuals: Note that there is difference between the model that I use and the hypothetical models I would be able to infer were I to use all my knowledge. Indeed, I can happily reason about 6th digit of $π$ being 7, since I don't know what it is, despite knowing the formula for calculating $π$ . I would only get into trouble if I were to do the calculations (and process their implications for the real world). Updating your models with new logical information seems like an important problem, but one I think is independent from counterfactual reasoning.)

...however, there remains the fact humans do counterfactual reasoning all the time, even about impossible things ("What if I decided to not write this comment?", "What if the Sun revolved around the Earth?"). I think this is consistent with the above definition, from three reasons. First, the models that humans use are complicated, fragmented, incomplete, and wrong. So much so that positing logical impossibilities (the Sun going around the Earth thing) doesn't make the model inconsistent (because it is so fragmented and incomplete). Second, when doing counterfactuals, we might take it for granted that you are to replace the actual observation history $o$ by some alternative $o^{'}$ . So you then apply the above definition to $M$ and $o^{'}$ (e.g., me not starting to write this comment). When $o^{'}$ is compatible with the model $M$ we use, everything is logically consistent (in $M$ ). For example, it might actually be impossible for me to not have started writing this comment, but it was perfectly consistent with my (wrong) model. Finally, when some counterfactual would be inconsistent with our model, we might take it for granted that we are supposed to relax $M$ in some manner. Moreover, people might often implicitly assume same/similar relaxation. For example, suppose I know that the month of May has 31 days. The natural relaxation is to be uncertain about month lengths while still remembering it was something between 28 and 31. I might this say that 30 was a perfectly reasonable length, while being indignant upon being asked to consider May that is 370 days long.

As for the implications for your question: The phrasing of 1) seems to suggest a model that has uncertainty about your decision procedure. Thus picking both 10 and 5 seems possible (and consistent with observation history of seeing the two boxes), and thus comparable. Note that this would seem fishier if you additionally posited that you are a utility maximizer (but, I argue, most people would implicitly relax this assumption if you asked them to consider the 5 counterfactual). Regarding 2) I think that "a typical AF reader" uses a model in which "a typical CDT adherent" can deliberate, come to the one-boxing conclusion, and find 1M in the box, making the options comparable for "typical AF readers". I think that "a typical CDT adherent" uses a model in which "CDT adherents" find the box empty while one-boxers find it full, thus making the options incomparable. The third question I didn't understand.

Disclaimer: I haven't been keeping up to date on discussions regarding these matters, so it might be that what I write has some obvious and known holes in it...

[-]Chris_Leong6y10

Hey Vojta, thanks so much for your thoughts.

I feel slightly worried about going too deep into discussions along the lines of "Vojta reacts to Chris' claims about what other LW people argue against hypothetical 1-boxing CDT researchers from classical academia that they haven't met" :D.

Fair enough. Especially since this post isn't so much about the way people currently frame their arguments but attempt to persuade people to reframe the discussion around comparability.

My take on how to do counterfactuals correctly is that this is not a property of the world, but of your mental models

I feel similarly. I've explained my reasons for believing this in the Co-operation Game, Counterfactuals are an Answer, not a Question and Counterfactuals as a matter of Social Convention.

According to this view, counterfactuals only make sense if your model contains uncertainty...

I would frame this slightly differently and say that this is the paradigmatic case which forms the basis of our initial definition. I think the example of numbers can be constructive here. The first numbers to be defined are the counting numbers: 1, 2, 3, 4... It is then convenient to add fractions, then zero, then negative numbers and eventually we extend to the complex numbers. In each case we've slightly shifted the definition of what a number is and this choice is solely determined by convention. Of course, convention isn't arbitrary, but determined by what is natural.

Similarly, the cases where there is actual uncertainty provides the initial domain over which we define counterfactuals. And we can then try to extend this as you are doing above. I see this as a very promising approach.

A lot of what you are saying there aligns with my most recent research direction (Counterfactuals as a matter of Social Convention), although it's unfortunately stalled with coronavirus and my focus being mostly on attempting to write up my ideas from the AI safety program. There seem to be a bunch of properties that make a situation more or less likely to be accepted by humans as a valid counterfactual. I think it would be viable to identify the main factors, with the actual weighting being decided by each human. This would acknowledge both the subjective, constructed nature of counterfactuals, but also the objective elements with real implications that doesn't make this a completely arbitrary choice. I would be keen to discuss further/bounce ideas of each other if you'd be up for it.

Finally, when some counterfactual would be inconsistent with our model, we might take it for granted that we are supposed to relax M in some manner

This sounds very similar to the erasure approach I was previously promoting, but have shifted away from. Basically, I when I started thinking about it, I realised that only allowing counterfactuals to be constructed by erasing information didn't match how humans actually use counterfactuals.

Second, when doing counterfactuals, we might take it for granted that you are to replace the actual observation history o by some alternative o′

This is much more relevant to how I think now.

I think that "a typical AF reader" uses a model in which "a typical CDT adherent" can deliberate, come to the one-boxing conclusion, and find 1M in the box, making the options comparable for "typical AF readers". I think that "a typical CDT adherent" uses a model in which "CDT adherents" find the box empty while one-boxers find it full, thus making the options incomparable

I think that's an accurate framing of where they are coming from.

The third question I didn't understand.

What was unclear? I made one typo where I said an EDT agent would smoke when I meant they wouldn't smoke. Is it clearer now?

[-]VojtaKovarik6y10

An evidential decision theorist would smoke in the Smoking Lesion problem so they don't get cancer.

Is this possibly a typo, and should it instead say that EDT would not smoke? (I never seem to remember the details of Smoking Lesion, but this seems inconsistent with the "so they don't get cancer".)

[-]Chris_Leong6y10

Yeah, sorry, that's a typo, fixed now.

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

4

What makes counterfactuals comparable?

4