Thanks to Caspar Oesterheld for the discussion on which this post is largely based.

In a previous post I presented an objection to The Evidentialist's Wager. Reading it is probably necessary to understand the following.

A counter-argument to my objection

In the post I broadly claim the following:

Imagine we have absolutely no idea whether more acausally correlated agents in the universe are positively correlated to us/Good Twins (our one-boxing is evidence for them increasing our utility function) or negatively correlated to us/Evil Twins (our one-boxing is evidence for them decreasing our utility function). That is, our credence on the two events is 50%^[1]. Then, when facing a Newcomb decision, the stakes for EDT and CDT are equal. This is because any evidence of additional utility gains provided by EDT will balance out in the expected value calculation, given our complete uncertainty about the interpretation of that evidence (whether one-boxing is evidence for a universal increase or decrease in our utility).

Shortly after writing the post, I discovered the following counter-argument: In that situation, EDT urges you to research further whether more positively or negatively correlated agents exist, to break the symmetry and then act accordingly. That is, it is plausible (it has non-zero probability) that dedicating more resources to studying this issue ends up breaking the symmetry, and changing your credences away from 50-50 in some of the two possible directions. If that happens, then the Wager applies and the stakes for EDT will be higher (which will either urge you to one-box or to two-box, depending on how the symmetry was broken). And so, even in the described situation, the stakes of EDT are higher, and if it's not immediately obvious it's just because EDT doesn't recommend neither of one-boxing or two-boxing, but the third option of researching further into the nature of the multiverse.

But the above argument presents a flaw, related to other issues with the concept of negative correlation, which is generally problematic and ill-defined (as I pointed out in the first post, to the extent that the definition of negatively correlated agents is not clear, but that's not an issue for what follows).

Considering meta-reasoning

Suppose the correlation between us and the Good Twins is perfectly positive (identical copies), and with the Evil Twins is perfectly negative (identical copies with flipped utility function). Then they will also be in the 50-50 situation and appreciate the EDT urge to research further, and so will research further. Of course, both groups will obtain different results from their research (if it's carried out correctly). That is, imagine the real distribution between Good Twins and Evil Twins is 60-40. Then the Good Twins will receive evidence that they are a majority, and the Evil Twins that they are a minority. And here, the acausal correlation is broken: the Evil Twins will no longer employ decision theory in the same way as us, because they have obtained some different evidence.

Naively (as in the above counter-argument), the Evil Twins might conclude: "Aha, so I should two-box (even if that's bad for me and all other Evil Twins), because then all Good Twins (of which there are more) will also two-box, so that's better for my utility function (provided EDT is correct, of course)". But they'll shortly notice that their correlation to the Good Twins has been broken, and so their actions no longer provide evidence about the Good Twins' actions. So they should consider only their correlated agents (all Evil Twins), and act accordingly (one-box). And of course, the Good Twins will also one-box (they would have even if the correlation to Evil Twins had somehow magically been preserved, because they're the majority).

That is, in breaking the symmetry we have also broken the correlation to all Evil Twins, since both us and Evil Twins were studying the same metric, but with opposite consequences for our actions. And so, we can't just "do the research and then one-box or two-box accordingly", because doing the research itself is an action that provides evidence (more on this below).

Here's what just happened to our argument. Originally we were only considering an idealized scenario, with a binary decision to take: you face a Newcomb problem, and your only two possible actions are to one-box or two-box. And sure, in this scenario, given 50-50 prior on the Twins, EDT and CDT will hold the same stakes. But when you take into account further actions which are obviously always available in any real-world situation^[2] (some of which are purely computational, such as just sitting there and thinking about the problem for some minutes with your physical brain), then it is plausible that this perfect symmetry breaks (even if by the slightest margin), and so the EDT high stakes return. But then, you notice that, upon going up to this meta-reasoning/partaking in this research, your correlated agents have also done so, and as a consequence some of them are no longer correlated. And going up to this meta-consideration once again, they will also have noticed this, and will act accordingly (so you already have the evidence that they will one-box, whatever you do). Notice this means (apparently paradoxically) that all agents, after carrying out this reasoning, will one-box. But they do so (or at least the Evil Twins) strictly because they notice they are no longer correlated to the others. That is, after carrying out this reasoning, the Evil Twins already have the full evidence that Good Twins will one-box. And so, two-boxing would only screw them over, and they one-box as well.

In other words, the Evil Twins would have liked to stop the research/reasoning from the start (or before arriving to conclusions that inform actions taken), because this would provide evidence that the Good Twins also have (or more strictly speaking, it wouldn't have provided Evil Twins with full evidence that the Good Twins will one-box). But of course, by the time they decided to do research (without yet knowing they would be the minority), this already provided evidence that the Good Twins were doing the same. And by the time they received negative results, this provided evidence that the Good Twins received positive results^[3].

But then, should you research?

By following the above argument, every agent with a credence of 50-50 will know that, if it partakes in research, then it can either conclude it's in the majority (and that the majority will know this and one-box) and so obtain evidence that its utility will be higher than expected, or conclude it's in the minority (and that the majority will know this and one-box) and so obtain evidence that its utility will be lower than expected. So it is both hopeful and fearful of partaking in research. I feel like these two considerations (or more concretely, expected utility calculations) will again perfectly balance out (but now on the meta-level, or possibly jumping to ever higher meta-levels), and so indeed an agent with credences of 50-50 will really just go with their best-guess decision theory (without EDT presenting higher stakes). Or at least, from their perspective this will have as much (inter-theoretic) expected utility as doing the research^[4].

One-boxing as a fixed point

But wait! We can include another practical consideration. An agent having exactly 50-50 credence might be as unlikely as the universe containing exactly 50-50 Twins (at least, for agents good enough at keeping track of precise probabilities). And indeed, even an agent with a 50-50 credence will very surely at some time receive some (maybe colateral and apparently uncorrelated) evidence that updates its credence on this issue and breaks the symmetry. By the above argument, by the time the symmetry has been broken (even if in this unintended manner) the correlation to Evil Twins will have been broken (they will have received the opposite update, and act accordingly). So from there on, the agent has evidence that all of its correlated agents are Good Twins, and will always apply the Wager and one-box (regardless of whether it is in the original majority or minority).^[5]

That is, this seems to indicate that (in the perfect correlation case) even the slightest evidence in any one of the two directions will prove forever (for the agent) that all correlated agents are Good Twins.

Some practical questions

On a related note (if we drop the assumption that all correlations are perfect), might it be that, the more research an agent carries out, the more certain it can be of being correlated only to ever-more-Good Twins? Imagine you get a small piece of evidence that you are the majority. If the research process is correct, most agents still correlated to you (that is, who have received the same evidence as you) are actually in the majority. But there might be some agents in the minority that, even following the same correct research process, because of some contingent bad luck, obtain evidence of being the majority, and so are still correlated to you. This would seem to be less probable the bigger the evidence received.

Might there be some situation in which an agent wants to ensure all of its correlates are Good Twins, and so should partake in more research before taking any other action? Maybe the fear of being the one with contingently bad luck (and so being correlated only to your Evil Twins) will always balance out the further security of being correct (and so correlated only to your Good Twins)^[6], so that the amount of research ends up not mattering (which would be counter-intuitive)?

The authors' actual counter-argument

Although I found the above ideas interesting, the easiest way out of my objection is just noticing that our credences should not be 50-50, but way more positive, which is Caspar's (and apparently the other authors') position.

Indeed, there are some solid basic arguments favoring Good Twins:

Anthropically, our existence provides evidence for them being favored.
It seems plausible that evolutionary pressures select for utility functions broadly as ours, although by fragility of value we might need very precise correlation (but this might still happen, even if less).
On a related note, even if very different evolutionary processes yield very different utility functions, it might be that there's a physical correlation (because of "brain" architecture or how physical contexts arise) between the decision theory (or other mechanisms) of an agent and its values.

In my first post, I mostly assessed 50-50 plausible because of the potential craziness of digital minds (many of which could be negatively correlated to us). But I'd still need an argument to defend the future existence of more negatively than positively correlated digital minds, and I don't have one. In fact, there are some obvious reasons to expect most digital minds in some futures to be positively correlated (we solve Alignment). While on the contrary, I don't see a clear reason to expect many negatively correlated minds in almost any future. This could be the case in scenarios with extortion/malevolent actors, but I find them even less likely, since they probably require the existence of another, approximately as intelligent actor positively correlated to us. This is not only conjunctive, but probably requires us solving Alignment but still facing an extorting/malevolent AGI, which seems improbable.

For further considerations on negative correlations and the probable values of superrational agents, see Caspar's Multiverse-wide Cooperation via Correlated Decision Making sections 2.6.2 and 3.4 (thanks to Sylvester Kollin for this recommendation!).

An unrelated after-thought: choosing the correct decision theory

In the original article, the authors convincingly argue for the reasonableness of hedging under decision-theoretical uncertainty. But some worries remain about the coherence of this whole approach, and especially the concept of there being a "correct" decision theory, and us being able to somehow amass evidence (or carry out computations) to improve our guess as to which is the correct one.

The authors address, given uncertainty about decision theories, how to carry out intertheoretical value comparisons. But they don't address how to compare the theories themselves, as theories of instrumental rationality (which should be value independent).

Indeed, suppose you have non-zero credence in both EDT and CDT. What would it mean (for you, subjectively) for one of them to be the "correct" decision theory? Arguably, for it to better maximize your goals (with respect to other theories). But of course, to compare such maximizations, you already need a decision theory (which tells you what "maximizing your goals" even is).

That is, you should just choose the decision theory such that the action "I follow $D$ " maximizes your utility function. But different theories will assess differently what constitutes an action maximizing your utility function. For instance, for CDT it will be that action causally affecting world states, while for EDT it will be that taking that action provides evidence about world states.

Might it be that choosing the correct decision theory can only come down to a matter of intuition or aesthetics, or even that it should be regarded as a preference, just as your utility function? It would seem intuitively like this kind of decision should be somehow justified in practical grounds.

Annex: EDT being counter-intuitive?

As evidenced above, EDT agents might sometimes prefer not to receive undesirable evidence. For instance, say piece of evidence A proves that there is suffering in the world (as opposed to none) and that Alice can take some effort to prevent some (but not all) of it. Then, even if that is the real state of the world, it would naively seem like Alice would rather not receive this piece of evidence (that is, her utility would get maximized that way, since her utility isn't calculated as some physical phenomenon anchored in the external world, but as the evidence she receives).

Of course, this is just a mistake of assessing EDT from the outside of the agent's perspective, when it is explicitly construed as a theory for subjective decision making.

That is, if Alice has no way to know that this evidence exists, then she is not incurring in any mistake (if she can't know it exists, she can't prevent suffering either). If on the contrary she has a way to know this (and has healthy enough epistemic practices as to notice this), then she knows that A might be true, so her expected utility is an average between A and $\neg$ A, and finding out whether A will in expected value maximize her utility (because if it turns out that A, she will be able to lower the amount of suffering).

Of course, the boundaries of "having a way to know" and "healthy enough epistemic practices" are fuzzy, and lead to considerations like whether failing to correctly assess a certain piece of evidence counts as ethically incorrect or even impermissible. And so an evidential framework (completely robust in theory) could seem to be prone to some undesirable consequences in imperfect practice, like wishful thinking (even if real-world utilitarians are used to avoiding such failure modes in their most obvious appearances).

^{^}
Or more correctly for expected value calculation purposes, if these amounts are $P$ and $N$ , and $C$ our credence function on events, then for every natural number $n$ (or real numbers to more generally take into account the different utilities contributed by different agents), $C (P = N + n) = C (N = P + n)$ .
^{^}
We might imagine a situation in which an intelligence genuinely only has available internal computations before deciding, and furthermore it is allowed to think exactly the time which it needs to think to arrive to the 50-50 same-stakes consideration, but not to the further meta-reasoning about the need to partake in research (that is exactly what happened to me with the last post!). In that logically bounded case, the agent will indeed posit equal stakes in both EDT and CDT, and so will just act according to its best-guess theory. But of course, all of these ideas are more clearly applied to way less bounded agents, for which the probability of this is negligible (unless another agent has for some reason adversarially calculated and implemented that exact time on purpose).
^{^}
We might imagine some situations in which the research results aren't negatively correlated in that way. Maybe for some physical or theoretical reason evidence of the existence of Evil Twins is way easier to find than evidence of the existence of Good Twins. But of course, the only way in which Evil Twins would be able to exploit this fact is if they knew, and this would provide evidence that Good Twins also know (and so can adjust their estimates accordingly).
We might also posit whether imperfect correlation can get around this issue. Concretely, we would need the other agents to be correlated enough as to carry out this whole top-level reasoning, but not correlated enough as to carry research in the same way as us. Not only does this seem implausible, but again we can only exploit this fact if we have some reason to think their research will drive them in a particular (broad) direction or to particular conclusions and actions. And if we know this, they do too. Although, might it be that for some reason Evil Twins are way easier to predict/model than Good Twins, and so one group can predict the other but not conversely? Again, this would break a certain symmetry, and so we'd need the other agents to be correlated enough as to carry out this whole top-level reasoning, but not correlated enough as to be equally predictable. Which seems even less likely. (Or maybe we are just degenerating the situation into one group predicting another, and so the one-way evidence has already come regardless of our actions, and no acausal trade occurs)
^{^}
But then, they will choose to just go with their best guess, because this wastes fewer of their resources, right? Well, not exactly, because if EDT is true this will also waste resources of their Evil Twins, which allegedly maximizes their utility (they have less resources to minimize it). The strength of this consideration of course depends on the agent's credence on EDT, and I feel like it should wash away as well, leaving the two options (going with the best-guess decision theory or doing research on the existence of Twins) literally equally valuable (or maybe doing research would be as valuable as following EDT and one-boxing). But this feels weird.
^{^}
Maybe I could also say "even an agent in the 50-50 state will contemplate this argument, and so put high probability on it changing opinion, and so one-box straight away, and so actually all agents apply the Wager and the 50-50 agent has no one to trade with, and knows that". But this argument is circular: I'm just restating "it is a priori very unlikely that 50-50 is right, so agents will have a strong prior against that", but it could still be that, even including these considerations, an agent is completely or almost certain of 50-50 being true.
^{^}
Maybe this only happens in the literally zero-sum game when both utility functions are literally opposite.