Mixed-Strategy Ratifiability Implies CDT=EDT

abramdemski

I provide conditions under which CDT=EDT in Bayes-net causal models.

[Epistemic status: Thanks to a discussion with Benja, I'm much less optimistic about the general thrust of this. In particular, the framework here isn't expressive enough to include XOR Blackmail. While I knew the framework would rule out some cases of interest, I was expecting to be happy to bite the bullet and say that we can't make a good story about where causality comes from in cases where EDT agents can't learn it by experimenting (IE, epsilon-exploration). But, on the contrary, XOR blackmail seems like a case where there is an intuitive notion of causality which might come from somewhere other than wishful thinking, and which differs significantly from the notion of causality which EDT agents can learn in frameworks similar to the one put forward in this post. I still have no idea where that notion of causality comes from, or whether to trust it, but XOR blackmail does seem to be a counterexample-in-spirit to the thrust of this post. More details here.]

(Cross-posted to lesserwrong.)

Previously, I discussed conditions under which LICDT=LIEDT. That case was fairly difficult to analyse, although it looks fairly difficult to get LICDT and LIEDT do differ. It's much easier to analyze the case of CDT and EDT ignoring logical uncertainty.

As I argued in that post, it seems to me that a lot of informal reasoning about the differences between CDT and EDT doesn't actually give the same problem representation to both decision theories. One can easily imagine handing a causal model to CDT and a joint probability distribution to EDT, without checking that the probability distribution could possibly be consistent with the causal model. Representing problems in Bayes nets seems like a good choice for comparing the behavior of CDT and EDT. CDT takes the network to encode causal information, while EDT ignores that and just uses the probability distribution encoded by the network.

It's easy to see that CDT=EDT if all the causal parents of an agent's decision are observed. CDT makes decisions by first cutting the links to parents, and then conditioning on alternative actions. EDT conditions on the alternatives without cutting links. So, EDT differs from CDT insofar as actions provide evidence about causal parents. If all parents are known, then it's not possible for CDT and EDT to differ.

So, any argument for CDT over EDT or vice versa must rely on the possibility of unobserved parents.

The most obvious parents to any decision node are the observations themselves. These are, of course, observed. But, it's possible that there are other significant causal parents which can't be observed so easily. For example, to recover the usual results in the classical thought experiments, it's common to add a node representing "the result of the agent's abstract algorithm node" which is a parent to the agent and any simulations of the agent. This abstract algorithm node captures the correlation which allows EDT to cooperate in the prisoner's dilemma and one-box in Newcomb, for example.

Here, I argue that sufficient introspection still implies that CDT=EDT. Essentially, the agent may not have direct access to all its causal parents, but if it has enough self-knowledge (unlike the setup in Smoking Lesion Steelman), the same screening-off phenomenon occurs. This is somewhat like saying that the output of the abstract algorithm node is known. Under this condition, EDT and CDT both two-box in Newcomb and defect in the prisoner's dilemma.

Mixed-Strategy Ratifiability

Suppose that CDT and EDT agents are given the same decision problem in the form of a Bayesian network. Actions are represented by a variable node in the network, $A$ , with values $a_{i}$ . Agents select mixed strategies somehow, under the constraint that their choice is maximal with respect to the expectations which they compute for their actions; IE:

(1) (EDT maximization constraint.) The EDT agent must choose a mixed strategy in which $P (a_{i}) > ϵ$ only if $a_{i} \in a r g m a x_{n} E (U | A = a_{n})$ ; IE, the action is among those which maximize expected utility. (2) (CDT maximization constraint.) The CDT agent is under the same restriction, but with respect to the causal expectation.

(Exploration constraint.) I further restrict all action probabilities to be at least epsilon, to ensure that the conditional expectations are well-defined.

(Ratifiability constraint.) I'll also assume ratifiability of mixed strategies: the belief state from which CDT and EDT make their decision is one in which they know which mixed strategy they select. Put another way, the decision is required to be stable under knowledge of the decision. I discuss ratifiability more here.

We can imagine the agent getting this kind of self-knowledge in several ways. Perhaps it knows its own source code and can reason about what it would do in situations like this. Perhaps it knows "how these things go" from experience. Or perhaps the decision rule which picks out the mixed strategies explicitly looks for a choice consistent with mixed-strategy ratifiability.

How this gets represented in the Bayes net is by a node representing the selection of mixed strategy, which I'll call $D$ (the "decision" node) which is the direct parent of $A$ (our action node). $D$ gives the probability of $A$ .

(Mixed-strategy implementability.) I also assume that $A$ has no other direct parents, representing the assumption that the choice of mixed strategy is the only thing determining the action. This is like the assumption that the environment doesn't contain anything which correlates itself with our random number generator to mess with our experimentation, which I discussed in the LICDT=LIEDT conditions post. It's allowable for things to be correlated with our randomness, but if so, they must be downstream of it. Hence, it's also a form of my "law of logical causality" from earlier.

Theorem 1. Under the above assumptions, the consistent choices of mixed strategy are the same for CDT and EDT.

Proof. The CDT and EDT expected utility calculations become the same under the mixed-strategy ratifiability condition, since $D$ screens $A$ off from any un-observed parents of $D$ . Besides that, all the rest of the constraints are already the same for CDT and EDT. So, the consistent choices of mixed strategies will be the same. $□$

It's natural to think of these possible choices as equilibria in the game-theoretic sense. My constraints on the decision procedures for EDT and CDT don't force any particular choice of mixed strategy in cases where several options have maximal utility; but, the condition that that choice must be self-consistent forces it into a few possibilities.

The important observation for my purposes is that this argument for CDT=EDT doesn't require any introspection beyond knowing which mixed strategy you're going to choose in the situation you're in. Perhaps this still seems like a lot to assume. I would contend that it's easier than you may think. As we saw in the logical inductor post, it just seems to happen naturally for LIDT agents. It would also seem to happen for agents who can reason about themselves, or simply know themselves well enough due to experience.

Furthermore, the ratifiability constraint is something which seems necessary to get certain problems right for independent reasons, as has been discussed in the CDT literature. So, if we didn't get it naturally, we would want to build it in.

The way I've defined CDT and EDT may seem a bit unnatural, since I've constrained them based on max-expectation choice of actions, but stated that they are choosing mixed strategies. Shouldn't I be selecting from the possible probability distributions on actions, based on the expected utility of those? This would invalidate my conclusion, since the CDT expectation of different choices of $D$ can differ from the EDT expectation. But, it is impossible to enforce ratifiability while also ensuring that conditioning on different choices of $D$ is well-defined. So, I think this way of doing it is the natural way when a ratifiability constraint is in play.

Approximate Ratifiability

More concerning, perhaps, is the way my argument takes under-specified decision procedures (only giving constraints under which a decision procedure is fit to be called CDT or EDT) and concludes a thing about what happens in the under-specified cases (effectively, any necessary tie-breaking between actions with equal expected utility must choose action probabilities consistent with the agent's beliefs about the probabilities of its actions). Wouldn't the argument just be invalid if we started with fully-specified versions of CDT and EDT, which already use some particular tie-breaking procedure? Shouldn't we, then, take this as an argument against ratifiability as opposed to an argument for CDT=EDT?

Certainly the conclusion doesn't follow without the assumption of ratifiability. I can address the concern to some extent, however, by making a version of the argument for fixed (but continuous) decision procedures under an approximate ratifiability condition. This will also get rid of the (perhaps annoying) exploration constraint.

(Continuous EDT) The EDT agent chooses mixed strategies according to some fixed way which is a continuous function of the belief-state (regarded as a function from worlds to probabilities). This function (the "selection function") is required to agree with $a r g m a x_{n} E (U | A = a_{n})$ when the expectations are well-defined and the differences in utilities between options are greater than some $ϵ > 0$ .

(Continuous CDT) The same, but taking CDT-style expectations.

(Approximate Ratifiability) Let the true mixed strategy which will be chosen by the agent's decision rule be $d^{*}$ . For any other $d \in D$ such that $| l n (d (a_{i})) - l n (d^{*} (a_{i}))) | > ϵ$ for any $a_{I} \in A$ , $P (D = d) = 0$ .

(We still assume mixed-strategy implementability, too.)

Approximate ratifiability doesn't perfectly block evidence from flowing backward from the action to the parents of the decision, like perfect ratifiability did. It does bound the amount of evidence, though: since the alternate $d$ must be very close to $d^{*}$ , the likelihood ratio cannot be large. Now, as we make epsilon arbitrarily small, there is some delta which bounds the differences in action utilities assigned by CDT and EDT which gets arbitrarily small as well. Hence, the EDT and CDT selection functions must agree on more and more.

By Brouwer's fixed-point theorem, there will be equilibria for the CDT and EDT selection functions. Although there's no guarantee these equilibria are close to each other the way I've spelled things out, we could construct selection functions for both CDT and EDT which get within epsilon of any of the equilibria from theorem 1.

Consequences for Counterfactuals

The arguments above are fairly rudimentary. The point I'm trying to drive at is more radical: there is basically one notion of counterfactual available. It is the one which both CDT and EDT arrive at, if they have very much introspection. It isn't particularly good for the kinds of decision-theory problems we'd like to solve: it tends to two-box in realistic Newcomb's problems (where the predictor is imperfect), defect in prisoner's dilemma, et cetera. My conclusion is that these are not problems to try and solve by counterfactual reasoning. They are problems to solve with updateless reasoning, bargaining, cooperative oracles, predictable exploration, and so on.

I don't think any of this is very new in terms of the arguments between CDT and EDT in the literature. Philosophers seem to have a fairly good understanding of how CDT equals EDT when introspection is possible; see SEP on objections to CDT. The proofs above are just versions of the tickle defense for EDT. However, I think the AI alignment community may not be so aware of the extent to which EDT and CDT coincide. Philosophers continue to distinguish between EDT and CDT, while knowing that they wouldn't differ for ideal introspective agents, on the grounds that decision theories should provide notions of rationality even under failure of introspection. It's worth asking whether advanced AIs may still have some fundamental introspection barriers which lead to different results for CDT and EDT. From where we stand now, looking at positive introspection results over the years, from probabilistic truth to reflective oracles to logical induction, I think the answer is no.

It's possible that a solution to AI alignment will be some kind of tool AI, designed to be highly intelligent in a restricted domain but incapable of thinking about other agent, including itself, on a strategic level. Perhaps there is a useful distinction between CDT and EDT in that case. Yet, such an AI hardly seems to need a decision theory at all, much less the kind of reflective decision theory which MIRI tends to think about.

The meagre reasons in the post above hardly seem to suffice to support this broad view, however. Perhaps my Smoking Lesion Steelman series gives some intuition for it (I, II, III). Perhaps I'll be able to make more of a case as time goes on.

Edited to add: A note on exploration in my setup.

In the perfect self-knowledge case, I assume epsilon exploration. Usually, exploration is added to ensure that an agent gains knowledge. Here, the agent already knows the structure of the problem; exploration is being added to maintain the integrity of reasoning, instead. But notice, also, that only EDT needs this. CDT is well-defined regardless; it can use its knowledge of the structure of the problem to reason.

In the imperfect-knowledge case, I remove the assumption by enforcing continuity. I might have done a similar thing in the perfect-knowledge case by filling in expectations of probability zero actions as limits of actions with very small probabilities. This is a little bit like saying that EDT can mimic CDT's ability to reason about probability-zero actions by adding a little uncertainty artificially. But, how does it know in what way to add this uncertainty?

Really, I think there should be a slightly nicer way to do the whole setup which uses an assumption about EDT maintaining some uncertainty about its action, rather than an assumption that its strategy is always mixed. It's true that the only way it can rationally maintain action uncertainty bounded away from zero is for the actions to be mixed; but, I don't actually need anything about bounding it away from zero. Probabilities could converge to zero, but always remain positive. Perhaps there's still a version of the theorem for that case. That is to say, perhaps exploration does not play a critical role here.

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

1

Mixed-Strategy Ratifiability Implies CDT=EDT

1

Mixed-Strategy Ratifiability

Approximate Ratifiability

Consequences for Counterfactuals