# Notes on "Can you control the past"

26 min read20th Oct 20221 comment

# 34

Decision Theory
Frontpage

The following is a (lightly edited version of a) series of notes I sent Joe Carlsmith about his essay, Can you control the past?. It's addressed to Joe, but it seems worth publishing here while I'm on the topic of decision theory. I’ve included some of his comments, and my replies, below.

I only recently skimmed Can you control the past?, and have a couple notes that you may or may not be interested in. (I'm not under the impression that this matters a ton, and am writing this recreationally.)

First: this is overall a great review of decision theories. Better than most I've seen. Nice.

Now, onto some more substansive points.

## Who am I?

I think a bunch of your sense of oddness about the "magic" that "you can write on whiteboards light-years away" is stemming from a faulty framing you have. In particular, the part where the word "you" points to a single physical instantiation of your algorithm in the universe. I'd say: insofar as your algorithm is multiply instantiated throughout the universe, there is no additional fact about which one is really you.

For analogy, consider tossing a coin in a quantum-mechanical universe, and covering it with your hand. The coin is superpositioned between heads and tails, and once you look at it, you'll decohere into Joe-who-saw-heads and Joe-who-saw-tails, both of whom stem from Joe-who-hasn't-looked-yet. So, before you look, are you Joe-who-saw-heads or Joe-who-saw-tails?

Wrong question! These two entities have not yet diverged; the pasts of those two separate entities coincide. The word "you", at the time before you split, refers to ~one configuration. The time-evolution splits the amplitude on that configuration between ~two distinct future configurations, and once they've split (by making different observations), each will be able to say "me" in a way that refers to them and not the other, but before the split there is no distinction to be made, no extra physical fact, and no real question as to whether pre-split Joe "is" Joe-who-will-see-heads versus Joe-who-will-see-tails.

(It's also maybe informative to imagine what happens if the quantum coin is biased. I'd say, even when the coin is 99.99999% biased towards heads, it's still the case that there isn't a real question about whether Joe-who-has-not-looked-at-the-coin is Joe-who-will-see-heads versus Joe-will-see-tails. There is a question of to what degree Joe-who-has-not-looked becomes Joe-who-saw-heads versus Joe-who-saw-tails, but that's a different sort of question.)

One of my most-confident guesses about anthropics is that being multiply-instantiated in other ways is analogous. For instance, if there are two identical physical copies of you (in physical rooms that are identical enough that you're going to make the same observations for the length of the hypothetical, etc.), then my guess is that there isn't a real question about which one is you. They are both you. You are the pattern, not the meat.

This person may become multiple people in the future, insofar as they see different things in different places-that-embed-them. But before the differing observations come in, they're both you. You can tell because the situation is symmetric: once you know all the physical facts, there's no additional bit telling you which one is "you".

From this perspective, the "magic" is much less mysterious: whenever you are multiply-instantiated, your actions are also multiply-instantiated. If you're multiply-instantiated in two places separated by a 10-light-year gap, then when you act, the two meat-bodies move in the same way on each side of the gap. This is all much less surprising once you acknowledge that "you" refers to everything that instantiates you(-who-have-seen-what-you-have-seen). Which, notably, is a viewpoint more-or-less forced upon us by quantum mechanics anyway.

Also, a subtlety: literal multiple-instantiation of your entire mind (in a place with sufficiently similar physics) is what you need to get "You can draw a demon kitten eating a windmill. You can scream, and dance, and wave your arms around, however you damn well please. Feel the wind on your face, cowboy: this is liberty. And yet, he will do the same." But it's much easier to find other creatures that make the same choice in a limited decision problem, but that won't draw the same demon kitten.

In particular, the thing you need for rational cooperation in a one-shot prisoner's dilemma, is multiple instantiation of your decision algorithm, which is notably smaller than your entire mind. Imagining multiple-instantiation of your entire mind is a fine intuition-pump, but the sort of multiple-instantiation humans find in real life is just of the decision-making fragment (which is enough).

Corollary: To a first approximation, the answer to "Can you control the past?" is "Well, you can be multiply instantiated at different points in time, and control the regions afterwards of the places you’re instantiated, and it’s possible for some of those to be beforewards of other places you’re instantiated. But you can’t control anything beforewards of your earliest instantiation."

To a second approximation, the above is true not only of you (in all your detailed glory, having learned everything you've learned and seen everything you've seen), but of your decision algorithm — a much smaller fragment of you, that is instantiated much more often, and thus can readily affect regions beforewards of the earliest instantiation of you-in-all-your-glory. This is what’s going on in the version of Newcomb’s problem, for instance, where Omega doesn’t simulate you in all your glory, but does reason accurately about the result of your decision algorithm (thereby instantiating it in the relevant sense).

More generally, I think it's worth distinguishing you from your decision algorithm. You can let your full self bleed into your decision-making fragment, by feeling the wind on your face and using specifics of your recent train-of-thought to determine what you draw. Or you can prevent your full self from bleeding into your decision-making fragment, by boiling the problem before you down into a simple and abstract decision problem.

Consider Omega's little sister Omicron, who can't figure out what you'll draw, but has no problem figuring out whether you'll one-box. You-who-have-felt-the-wind-on-your-face are not instantiated in the past, but your decision algorithm on a simple problem could well be. It's the latter that controls things that are beforewards of you (but afterwards of Omicron).

I personally don't think I (Nate-in-all-his-glory) can personally control the past. I think that my decision-procedure can control the future laid out before each and every one of its instantiations.

Is the box in Newcomb's problem full because I one-box? Well, it's full because The Algorithm one-boxes, and I'm a full-ass person wrapped around The Algorithm, but I'm not the instance of The Algorithm that Omicron was looking at, so it seems a bit weird to blame it on me. Like how when you use a calculator to check whether 7 divides 1331 and use that knowledge to decide how to make a bet, and then later I use a different calculator to see whether 1331 is prime in a way that includes (as an intermediate step) checking whether 7 divides it, it's a bit weird to say that my longer calculation was the cause of your bet.

I'm a longer calculation than The Algorithm. It wasn't me who controlled the past, it was The Algorithm Omega looked at, and that I follow.

If you ever manage to get two copies of me (the cowboy who feels the wind on his face) at different times, then in that case I'll say that I (who am both copies) control the earlier-copy's future and the later-copy's past (necessarily in ways that the later copy has not yet observed, for otherwise we are not true copies). Till then, it is merely the past instances of my decision algorithm that control my past, not me.

(Which doesn't mean that I can choose something other than what my decision algorithm selects in any given case, thereby throwing off the yoke; that's crazytalk; if you think you can throw off the yoke of your own decision algorithm then you've failed to correctly identify the fragment of you that makes decisions.)

## LDT doesn’t pass up guaranteed payoffs

Logical decision theorists firmly deny that they pass up guaranteed payoffs. (I can't quite tell from a skim whether you understand this; apologies if I missed the parts where you acknowledge this.)

As you probably know, in a twin PD problem, a CDT agent might protest that by cooperating you pass up a guaranteed payoff, because (they say) defecting is a dominant strategy. A logical decision theorist counters that the CDT agent has made an error, by imagining that "I defect while my twin cooperates" is a possibility, when in fact it is not.

In particular, when the CDT agent closes their eyes and imagines defecting, they (wrongly) imagine that the action of their twin remains fixed. Among the actual possibilities (cooperate, cooperate) and (defect, defect), the former clearly dominates. The disagreement is not about whether to take dominated strategies, but about what possibilities to admit in the matrix from which we calculate what is dominated and what is not.

Now consider Parfit's hitchhiker. An LDT agent withdraws the $10k and gives it to the selfish man. Will MacAskill objects, "you're passing up a guaranteed payoff of$10k, now that you're certain you're in the city!". The LDT agent says "you have made an error, by imagining ‘I fail to pay while being in the city’ is a possibility, when in fact it is not. In particular, when you close your eyes and imagine not paying, you (wrongly) imagine that your location remains fixed, and wind up imagining an impossibility."

Objecting “it's crazy to imagine your location changing if you fail to pay” is a fair criticism. Objecting that logical decision theorists pass up guaranteed payoffs is not.

The whole question at hand is how to evaluate the counterfactuals. Causal decision theorists say "according to my counterfactuals, if you pay you lose $10k, thus passing up a guaranteed payoff", whereas logical decision theorists say "your counterfactuals are broken, if I don't pay then I die, life is worth more than$10k to me, I am taking the action with the highest payoff". You're welcome to argue that logical decision theorists calculate their counterfactuals wrong, if you think that, but saying we pass up guaranteed payoffs is either confused or disingenuous.

## Parfit’s hitchhiker and contradicting the problem statement

There's a cute theorem I've proven (or, well, I've jotted down what looks to me like a proof somewhere, but haven't machine-checked it or anything), which says that if you want to disagree with logical decision theorists, then you have to disagree in cases where the predictor is literally perfect. The idea is that we can break any decision problem down by cases (like "insofar as the predictor is accurate, ..." and "insofar as the predictor is inaccurate, ...") and that all the competing decision theories (CDT, EDT, LDT) agree about how to aggregate cases. So if you want to disagree, you have to disagree in one of the separated cases. (And, spoilers, it's not going to be the case where the predictor is on the fritz.)

I see this theorem as the counter to the decidedly human response "but in real life, predictors are never perfect". "OK!", I respond, "But decomposing a decision problem by cases is always valid, so what do you suggest we do under the assumption that the predictor is accurate?"

Even if perfect predictors don't exist in real life, your behavior in the more complicated probabilistic setting should be assembled out of a mixture of ways you'd behave in simpler cases. Or, at least, so all the standard leading decision theories prescribe. So, pray tell, what do you do insofar as the predictor reasoned accurately?

I think this is a good intuition pump for the thing where logical decision theorists are like "if I imagine stiffing the driver, then I imagine dying in the desert." Insofar as the predictor is accurate, imagining being in the city after stiffing the driver is just as bonkers as imagining defecting while your twin cooperates.

One way I like to think about it is, this decision problem is set up in a fashion that purports to reveal the agent's choice to them before they make it. What, then, happens in the case where the agent acts inconsistently with this revelation? The scenario is ill-defined.

Like, consider the decision problem "You may have either a cookie or a bonk on the head, and you're going to choose the bonk on the head. Which do you choose?" The cookie might seem more appealing than the bonk, but observe that taking the cookie refutes the problem statement. It's at least a little weird to confidently assert that, in that case, you get a cookie. What you really get is a contradiction. And sure, ex falso quodlibet, but it seems a bit strange to anchor on the cookie.

It's not the fault of the agent that this problem statement is refutable by some act of the agent! The problem is ill-defined without someone telling us what actually happens if we refute the problem statement. If you try to take the cookie, you don’t actually wind up with a cookie; you yeet yourself clean out of the hypothetical. To figure out whether to take the cookie, you need to know where you'd land.

Parfit's hitchhiker, at the point where you're standing at the ATM, is much like this. The alleged problem statement is "you may either lose $0 or$10,000, and you're going to choose to lose $10,000". At which point we're like "Hold on a sec, the problem statement makes an assertion about my choice, which I can refute. What happens if I refute the problem statement?" At which point the question-poser is like "haha oops, yeah, if you refute the problem statement then you die alone in the desert". At which point, yeah, when the logical decision theorist closes their eyes and imagines stiffing the driver, then (under the assumption that the driver is accurate) they're like "oh dang, this would refute my observations; what happens in that case again? right, I'd die alone in the desert, which is worse than losing$10,000", and then they pay.

(I also note that this counterfactual they visualize is correct. Insofar as the predictor is accurate, if they wouldn't pay, then they would die alone in the desert instead. That is, in real life, what happens to non-payers who face accurate predictors. The "$0" was a red herring; that case is contradictory and cannot actually be attained.) (In the problem where you may have either a cookie or a bonk, and you're going to take the bonk, but if you render the problem inconsistent then you get two cookies, by all means, take the cookie. But in the problem where you may have either a cookie or a bonk, and you're going to take the bonk, but if you render the problem inconsistent then you die alone in the desert, then take the dang bonk.) This sort of thing definitely runs counter to some human intuitions — presumably because, in real life, we rarely observe consequences of actions we haven't made yet. (Well, except for in a variety of social settings, where we have patches such as "honor" and "reputation" that, notably, give the correct answer in this case, but I digress.) This is where I think my cute theorem makes it easier to see what's going on: insofar as the predictor is perfect, it doesn't make sense to visualize being in the city after stiffing the driver. When you're standing in front of the ATM, and you screw your eyes shut and imagine what happens if you just run off instead of withdrawing the money, then in the case where the predictor reasoned correctly, your visualizer should be like ERROR ERROR HOW DID WE GET TO THE CITY?, and then fall back to visualizing you dying alone in the desert. Is it weird that your counterfactual-visualizer paints pictures of you being in the desert, even though you remember being driven to the city? Yep. But it's not the agent's fault that they were shown a consequence of their choice before making their choice; they're not the one who put the potential for contradiction into the decision problem. Avoiding contradiction isn’t their problem. One of their available choices is contradictory with observation (at least under the assumption that the predictor is accurate), and they need to handle the contradiction somehow, and the problem says right there on the tin that if you would cause a contradiction then you die alone in the desert instead. (Humans, of course, implement the correct decision in this case via a sense of honor or suchlike. Which is astute! "I will pay, because I said I would pay and I am a man of my word" can be seen as a shadow of the correct line of reasoning, cast onto monkey brains that were otherwise ill-suited for it. I endorse the practice of recruiting your intuitions about honor to perform correct counterfactual reasoning.) (And these counterfactuals are true, to be clear. You can't go find people who were accurately predicted, driven to the city, and then stiffed the driver. There are none to be found.) Do you see how useful this cute little theorem is? I love it. Instead of worrying about "but what if the driver was simply a fool, and I can save$10k?", we get to decompose the decision problem down into cases, one where the driver was incorrect, and one where they were correct. We all agree that insofar as they're incorrect you have to stiff them, and we all agree about how to aggregate cases, so the remaining question is what you do insofar as they're accurate. And insofar as they're accurate, the contradiction is laid bare. And the "stand in front of the ATM, but visualize yourself dying in the desert" thing feels quite justified, at least to me, as a response to a full-on contradiction.

Just remember that it's not your job to render the universe consistent, and that contradictions can't actually happen. Insofar as the predictor is accurate, imagining yourself surviving and then stiffing the driver makes just as much sense as imagining yourself defecting against your cooperating clone.

Finally, a minor note: I think the twin clone prisoner's dilemma is sufficient to kill CDT. But if you want to kill it extra dead, you might be interested in the fact that you can turn CDT into a money pump whenever you have a predictor that's more accurate than chance, using some cleverness and the fact that you can expand CDT's action space by also offering it contracts that pay out in counterfactuals that are less possible than CDT pretends they are.

# 34

Mentioned in
New Comment
1 comment, sorted by Click to highlight new comments since:

The issue with you-in-all-detail vs. your-decision-algorithm is that a decision algorithm can have different levels of updatelessness, it's unclear what the decision algorithm already knows vs. what a policy it chooses takes as input. So we pick some intermediate level that is updateless enough to allow acausal coordination among relevant entities (agents/predictors), and updateful enough to make a decision without running out of time/memory while being implemented in its instances. But that level/scope is different for different collections of entities being coordinated.

So I think a boundary shouldn't be drawn around "a decision algorithm", but around whatever common knowledge of each other the entities being acausally coordinated happen to have (where they don't need to have common knowledge of everything). When packaged as a decision algorithm, the common knowledge becomes an adjudicator, which these entities can allow influence over their actions. To the extent the influence they allow an adjudicator is common knowledge among them, it also becomes knowledge of the adjudicator, available for its decision making reasoning.

Importantly for the reframing, an adjudicator is not a decision algorithm belonging to either agent individually, it's instead a shared decision algorithm. It's a single decision algorithm purposefully built out of the agents' common knowledge of each other, rather than a collection of their decision algorithms that luckily happen to have common knowledge of each other. It's much easier for there to be some common knowledge than for there to be common knowledge of individually predefined decision algorithms that each agent follows.