Introduction and motivation

The starting point for this post is a comment I wrote on Paul's post on EDT vs CDT:

One argument for CDT over EDT that you didn’t mention in this post: Suppose you live in a deterministic universe and know your own source code. Suppose you are deciding between taking a 5 dollar bill and a 10 dollar bill. Suppose your world model says you take the 5 dollar bill with 100% probability. Now conditioning on taking the 10 dollar bill gives you complete garbage, since you are conditioning on a probability 0 event. If you use EDT, then depending on other details of your decision procedure, this could lead to you always taking the 5 dollar bill. So then your world model would be accurate. (This is the “5 and 10” problem often discussed at MIRI; I don’t know if it has been written up anywhere)

CDT never generates undefined expected utility estimates like EDT does. It takes the 10 dollar bill in this problem. However, if it always takes the 10 dollar bill, then its counterfactual for taking the 5 dollar bill is strange because it is one in which a physical law is violated. The violation of a physical law could have important consequences other than which action the agent takes.

Both decision theories have trouble with this problem, but at least CDT always produces a defined answer.

Here’s another way of thinking about this problem. A fully Bayesian version of EDT must construct all possible worlds and then condition on taking a certain action. But each of these possible worlds contains a running copy of the EDT algorithm. So, absent some defined method for taking a fixed point, this leads to an infinite loop, and you can’t actually have a fully Bayesian version of EDT.

(What if you use reflective oracles to allow EDT to select some fixed point? We could specify that the reflective oracle returns arbitrary results when asked to condition on a probability 0 event (I think this is what the most natural way to emulate conditional queries on a reflective oracle results in, but I haven’t checked). Now there are multiple possible reflective oracles (i.e. fixed points); it’s possible to always take the 10 dollar bill and think bad things will happen conditional on taking the 5 dollar bill, and it’s also possible to always take the 5 dollar bill and think bad things will happen conditional on taking the 10 dollar bill.)

A fully Bayesian version of CDT must construct all possible counterfactuals. Each of these counterfactuals contains a running copy of CDT, so one might think the same problem applies. But in each of these counterfactuals, the output of the CDT algorithm is “thrown away”, since the agent’s action is controlled by a magic counterfactual intervention rather than its algorithm. So, if the CDT algorithm is sandboxed, the CDT’s world model can simply ignore the running CDT algorithm, as it has no effect. Thus, at least in single-agent problems (with no predictors etc), a fully Bayesian version of CDT is possible in principle, though obviously not in practice.

Some time during or after writing this comment, I noticed something: the equilibrium where the EDT agent thinks it always takes the 5 dollar bill, and therefore gets garbage (possibly low) estimates when considering taking the 10 dollar bill, and therefore never takes the 10 dollar bill, is extremely unstable. As soon as the agent assigns any probability at all to taking the 10 dollar bill, their conditional expected utility estimates are perfect. Can we use this fact to design a variant of EDT that always takes the 10 dollar bill?

Yes, yes we can.

Definitions and main theorem statements

Reflective conditional oracle distributions will be defined similar to in previous work such as reflective oracles and reflective oracle distributions. I recommend understanding reflective oracles before reading this post (understanding reflective oracle distributions is helpful but unnecessary).

Let be some finite subset of the set of probabilistic Turing machines that may query a conditional oracle (defined in the next paragraph). They must each always halt and return either 0 or 1.

Definition 1: A conditional oracle is a function .

Intuitively, asks whether is greater or less than , where is taken from some distribution over conditional oracles.

We make the assumption that machines in only ever query the oracle with an element of . All further definitions and lemmas/theorems are parameterized on some satisfying this assumption.

Definition 2: A conditional oracle distribution (COD) is a distribution over conditional oracles, of type .

Definition 3: A COD is fully-mixed if it assigns nonzero probability to each possible conditional oracle.

Since the number of conditional oracles is finite, there are fully-mixed CODs.

Definition 4: A COD weakly leads to a COD iff, for all :

(Here, the notation refers to the probability of event when .)

Intuitively, weakly leads to iff accurately answers conditional queries that are about . If for some and also for some (possibly different) and is fully-mixed, these conditions are equivalent to:

As notation, let map a COD to the set of CODs it weakly leads to. This and related functions will be interpreted as set-valued when referring to their graphs.

Unfortunately, these conditional queries are not sensible when or ; the oracle is allowed to return any answer. We will define a stricter notion of "leading to" that will require these answers to be sensible.

Let be a set-valued function whose graph is the closure of (the graph of restricted to where is fully-mixed). It will turn out that for fully-mixed .

Equivalently, iff there are infinite COD sequences and such that (a) each is fully mixed, (b) each weakly leads to , (c) limits to , and (d) limits to .

Intuitively, is equivalent to for fully-mixed and generalizes to non-fully-mixed through taking limits that involve only fully-mixed values.

Define . It will turn out that for fully-mixed .

(Why take the convex hull? This is to make a Kakutani map. It might be the case that is always convex but I haven't proven this.)

Definition 5: A COD leads to a COD iff .

Due to Theorem 1, it will turn out that a fully-mixed COD leads to iff it strictly leads to . Leading to is a stronger condition than weak leading to, as will be proven.

At this point we are ready to define a reflection condition:

Definition 6: A COD is reflective iff it leads to itself.

Intuitively, a COD is reflective iff it accurately answers queries that are about itself.

This post's main results are the following (in addition to the proof that EDT beats 5 and 10):

Theorem 1: For any fully-mixed COD , .

Theorem 2: If leads to , then it weakly leads to .

Theorem 3: There is a reflective COD.

These are proven at the end of the post.

Defining EDT

EDT can be defined using a reflective COD; this decision theory will be called COEDT. Let the decision problem be described by a Turing machine with an embedded agent, where this Turing machine including its agent may randomize and call a conditional oracle, and which returns either 0 or 1 to represent the agent's utility (intermediate utilities can be emulated by randomizing). For example, for the 5 and 10 problem, we may define:

where is the source code of the agent .

COEDT is a function from the universe program (which already contains an embedded COEDT agent) to action. The following COEDT variant handles cases where there are only 2 actions:

It uses the conditional oracle to determine if it has a higher chance of winning conditional on taking action 1 or action 0, and takes the action that it is more likely to win conditional on taking.

What does do on ? We can take a fixed point:

Let the set of machines considered be

Theorem 4: For any reflective COD on , .

Proof:

Informally, this is true because the agent must always take action 1 when queries are about any fully-mixed COD.

, so it is a convex combination of CODs in . Let that convex combination be .

Each , so we have COD sequences limiting to (with each being fully mixed) and limiting to such that each .

For each , since and both return 1 for some conditional oracles, and because weakly leads to , we have: