[Epistemic Status: this series of two posts gives some arguments which, in my eyes, make it difficult to maintain a position other than CDT=EDT, but not impossible. As I explain at the end of the second post, it is still quite tenable to suppose that CDT and EDT end up taking different actions.]

Previously, I argued that fair comparisons of CDT and EDT (in which the same problem representation is given to both decision theories) will conclude that CDT=EDT, under what I see as reasonable assumptions. Recently, Paul Christiano wrote a post arguing that, all things considered, the evidence strongly favors EDT. Jessica Taylor pointed out that Paul didn't address the problem of conditioning on probability zero events, but she came up with a novel way of addressing that problem by taking the limit of small probabilities: COEDT.

Here, I provide further arguments that rationality constraints point in the direction of COEDT-like solutions.

Note that I argue for the conclusion that CDT=EDT, which is somewhat different from arguing directly for EDT; my line of reasoning suggests some additional structure which could be missed by advocating EDT in isolation (or CDT in isolation). Paul's post described CDT as a very special case of EDT, in which our action is independent of other things we care about. This is true, but, we can also accurately describe EDT is a very special case of CDT where all probabilistic relationships which remain after conditioning on what we know turn out to also be causal relationships. I more often think in the second way, because CDT can have all sorts of counterfactuals based on how causation works. EDT claims that these are only correct when they agree with the conditional probabilities.

(ETA: When I say "CDT", I'm pointing at some kind of steel-man of CDT which uses logical counterfactuals rather than physical counterfactuals. TDT is a CDT in this sense, whereas UDT could be either CDT or EDT.)

This post will be full of conjectural sketches, and mainly serves to convey my intuitions about how COEDT could fit into the larger picture.

Hyperreal Probability

Initially, thinking about COEDT, I was concerned that although something important had been accomplished, the construction via limits didn't seem fundamental enough that it should belong in our basic notion of rationality. Then, I recalled how hyperreal numbers (which can be thought of as sequences of real numbers) are a natural generalization of decision theory. This crops up in several different forms in different areas of Bayesian foundations, but most critically for the current discussion, in the question of how to condition on probability zero events. Quoting an earlier post of mine:

In What Conditional Probabilities Could Not Be, Alan Hajek argues that conditional probability cannot possibly be defined by Bayes’ famous formula, due primarily to its inadequacy when conditioning on events of probability zero. He also takes issue with other proposed definitions, arguing that conditional probability should instead be taken as primitive.

The most popular way of doing this are Popper’s axioms of conditional probability. In Learning the Impossible (Vann McGee, 1994), it’s shown that conditional probability functions following Popper’s axioms and nonstandard-real probability functions with conditionals defined according to Bayes’ theorem are inter-translatable. Hajek doesn’t like the infinitesimal approach because of the resulting non-uniqueness of representation; but, for those who don’t see this as a problem but who put some stock in Hajek’s other arguments, this would be another point in favor of infinitesimal probability.

In other words, there is an axiomatization of probability -- Popper's axioms -- which takes conditional probability to be fundamental rather than derived. This approach is relatively unknown outside philosophy, but often advocated by philosophers as a superior notion of probability, largely because it allows one to condition on probability zero events. Popper's axioms are in some sense equivalent to allowing hyperreal probabilities, which also means (with a little mathematical hand-waving; I haven't worked this out in detail) we can think of them as a limit of a sequence of strictly nonzero probability distributions.

All of this agrees nicely with Jessica's approach.

I take this to strongly suggest that reasonable approaches to conditioning on probability zero events in EDT will share the limit-like aspect of Jessica's approach, even if it isn't obvious that they do. (Popper's axioms are "limit-like", but this was probably not obvious to Popper.) The major contribution of COEDT beyond this is to provide a particular way of constructing such limits.

(Having the idea "counterfactuals should look like conditionals in hyperreal probability distributions" is not enough to solve decision theory problems alone, since it is far from obvious how we should construct hyperreal probability distributions over logic to get reasonable logical counterfactuals.)

Hyperreal Bayes Nets & CDT=EDT

(The following argument is the only justification of the title of the post which will appear in Part 1. I'll have a different argument for the claim in the title in Part 2.)

The CDT=EDT argument can now be adapted to hyperreal structures. My original argument required:

1. Probabilities & Causal Structure are Compatible: The decision problem is given as a Bayes net, including an action node (for the actual action taken by the agent) and a decision node (for the mixed strategy the agent decides on). The CDT agent interprets this as a causal net, whereas the EDT agent ignores the causal information and treats it as a probability distribution.

2. Exploration: all action probabilities are bounded away from zero in the decision; that is, the decision node is restricted to mixed strategies in which each action gets some minimal probability.

3. Mixed-Strategy Ratifiability: The agents know the state of the decision node. (This can be relaxed to approximate self-knowledge under some additional assumptions.)

4. Mixed-Strategy Implementability: The action node doesn't have any parents other than the decision node.

I justified assumption #2 as an extension of the desire to give EDT a fair trial: EDT is only clearly-defined in cases with epsilon exploration, so I argued that CDT and EDT should be compared with epsilon-exploration. However, if you prefer CDT because EDT isn't well-defined when conditioning on probability zero actions, this isn't much of an argument.

We can now address this by requiring conditionals on probability zero events to be limits of sequences of conditionals in which the event has greater than zero probability. Or (I think equivalently), we think of the probability distribution as being the real part of a hyperreal probability distribution.

Having done this, we can apply the same CDT=EDT result to Bayes nets with hyperreal conditional probability tables. This shows that CDT still equals EDT without restricting to mixed strategies, so long as conditionals on zero-probability actions are defined via limits.

This still leaves the other questionable assumptions behind the CDT=EDT theorem.

#1 (compatible probability & causality): I framed this assumption as the main condition for a fair fight between CDT and EDT: if the causal structure is not compatible with the probability distribution, then you are basically handing different problems to CDT and EDT and then complaining that one gets worse results than the other. However, the case is not so clear as I made it out to be. In cases where CDT/EDT are in specific decision problems which they understand well, the causal structure and probabilistic structure must be compatible. However, boundedly rational agents will have inconsistent beliefs, and it may be that beliefs about causal structure are sometimes inconsistent with other beliefs. An advocate of CDT or EDT might say that the differentiating cases are on exactly such inconsistent examples.

Although I agree that it's important to consider how agents deal with inconsistent beliefs (that's logical uncertainty!), I don't currently think it makes sense to judge them on inconsistent decision problems. So, I'll set aside such problems.

Notice, however, that one might contest whether there's necessarily a reasonable causal structure at all, and deny #1 that way.

#3 (ratifiability): The ratifiability assumption is a kind of equilibrium concept; the agent's mixed strategy has to be in equilibrium with knowledge of that very mixed strategy. I argued that it is as much a part of understanding the situation the agent is in as anything else, and that it is usually approximately achievable (IE, doesn't cause terrible self-reference problems or imply logical omniscience). However, I didn't prove that a ratifiable equilibrium always exists! Non-existence would trivialize the result, making it into an argument from false premises to a false conclusion.

Jessica's COEDT results address this concern, showing that this level of self-knowledge is indeed feasible.

#4 (implementability): I think of this as the shakiest assumption; it is easy to set up decision problems which violate it. However, I tend to think such setups get the causal structure wrong. Other parents of the action should instead be thought of as children of the action. Furthermore, if an agent is learning about the structure of a situation by repeated exposure to that situation, implementability seems necessary for the agent to come to understand the situation it is in: parents of the action will look like children if you try to perform experiments to see what happens when you do different things.

I won't provide any direct arguments for the implementability constraint in the rest of this post, but I'll be discussing other connections between learning and counterfactual reasoning.

Are We Really Eliminating Exploration?

Ways of Taking Counterfactuals are Somewhat Interchangeable

When thinking about decision theory, we tend to focus on putting the agent in a particular well-defined problem. However, realistically, an agent has a large amount of uncertainty about the structure of the situation it is in. So, a big part of getting things right is learning what situation you're in.

Any reasonable way of defining counterfactuals for actions, be it CDT or COEDT or something else, is going to be able to describe essentially any combination of consequences for the different actions. So, for an agent who doesn't know what situation it is in, any system of counterfactuals is possible no matter how counterfactuals are defined. In some sense, this means that getting counterfactuals right will be mainly up to the learning. Choosing between different kinds of counterfactual reasoning is a bit like choosing different priors -- you would hope it gets washed out by learning.

Exploration is Always Necessary for Learning Guarantees

COEDT eliminates the need for exploration in 5-and-10, which intuitively means cases where it should be really, really obvious what to do. It isn't clear to what extent COEDT helps with other issues. I'm skeptical that COEDT alone will allow us to get the right counterfactuals for game-theoretic reasoning. But, it is really clear that COEDT doesn't change the fundamental trade-off between learning guarantees (via exploration) and Bayesian optimality (without exploration).

This is illustrated by the following problem:

Scary Door Problem. According to your prior, there is some chance that doors of a certain red color conceal monsters who will destroy the universe if disturbed. Your prior holds that this is not very strongly correlated to any facts you could observe without opening such a door. So, there is no way to know whether such doors conceal universe-destroying monsters without trying them. If you knew such doors were free of universe-destroying monsters, there are various reasons why you might sometimes want to open them.

The scary door problem illustrates the basic trade-off between asymptotic optimality and subjective optimality. Epsilon exploration would guarantee that you occasionally open scary doors. If such doors conceal monsters, you destroy the universe. However, if you refuse to open scary doors, then it may be that you never learn to perform optimally in the world you're in.

What COEDT does is show that the scary door and 5-and-10 really are different sorts of problem. If there weren't approaches like COEDT which eliminate the need for exploration in 5-and-10, we would be forced to conclude that they're the same: no matter how easy the problem looks, you have to explore in order to learn the right counterfactuals.

So, COEDT shows that not all counterfactual reasoning has to reduce to learning. There are problems you can get right by reasoning alone. You don't always have to explore; you can refuse to open scary doors, while still reliably picking up $10.

I mentioned that choosing between different notions of counterfactual is kind of like choosing between different priors -- you might hope it gets washed out by learning. The scary door problem illustrates why we might not want the learning to be powerful enough to wash out the prior. This means getting the prior right is quite important.

You Still Explore in Logical Time

If you follow the logical time analogy, it seems like you can't ever really construct logical counterfactuals without exploration in some sense: if you reason about a counterfactual, the counterfactual scenario exists somewhere in your logical past, since it is a real mathematical object. Hence, you must take the alternate action sometimes in order to reason about it at all.

So, how does a COEDT agent manage not to explore?

COEDT can be thought of as "learning" from an infinite sequence of agents who explore less and less. None of those agents are COEDT agents, but they get closer and closer. If each of these agents exists at a finite logical time, COEDT exists at an infinite logical time, greater than any of the agents COEDT learns from. So, COEDT doesn't need to explore because COEDT doesn't try to learn from agents maximally similar to itself; it is OK with a systematic difference between itself and the reference class it logically learns from.

This systematic difference may allow us to drive a wedge between the agent and its reference class to demonstrate problematic behavior. I won't try to construct such a case today.

In the COEDT post, Jessica says:

I consider COEDT to be major progress in decision theory. Before COEDT, there were (as far as I know) 3 different ways to solve 5 and 10, all based on counterfactuals:

• Causal counterfactuals (as in CDT), where counterfactuals are worlds where physical magic happens to force the agent's action to be something specific.

• Model-theoretic counterfactuals (as in modal UDT), where counterfactuals are models in which false statements are true, e.g. where PA is inconsistent.

• Probabilistic conditionals (as in reinforcement learning and logical inductor based decision theories such as LIEDT/LICDT and asymptotic decision theory), where counterfactuals are possible worlds assigned a small but nonzero probability by the agent in which the agent takes a different action through "exploration"; note that ADT-style optimism is a type of exploration.

COEDT is a new way to solve 5 and 10. My best intuitive understanding is that, whereas ordinary EDT (using ordinary reflective oracles) seeks any equilibrium between beliefs and policy, COEDT specifically seeks a not-extremely-unstable equilibrium (though not necessarily one that is stable in the sense of dynamical systems), where the equilibrium is "justified" by the fact that there are arbitrarily close almost-equilibria. This is similar to trembling hand perfect equilibrium. To the extent that COEDT has counterfactuals, they are these worlds where the oracle distribution is not actually reflective but is very close to the actual oracle distribution, and in which the agent takes a suboptimal action with very small probability.

Based on my picture, I think COEDT belongs in the modal UDT class. Both proposals can be seen as a special sort of exploration where we explore if we are in a nonstandard model. Modal UDT explores if PA is inconsistent. COEDT explores if a randomly sampled positive real in the unit interval happens to be less than some nonstandard epsilon. :)

(Note that describing them in this way is a little misleading, since it makes them sound uncomputable. Modal UDT in particular is quite computable, if the decision problem has the right form and if we are happy to assume that PA is consistent.)

I'll be curious to see how well this analogy holds up. Will COEDT have fundamentally new behavior in some sense?

More thoughts to follow in Part 2.

[-]jessicata7y50

COEDT can be thought of as "learning" from an infinite sequence of agents who explore less and less.

Interestingly, the issue COEDT has with sequential decision problems looks suspiciously similar to folk theorems in iterated game theory (which also imply that completely-aligned agents can get a very bad outcome because they will each maximally punish anyone who doesn't play the grim trigger strategy). There might be some kind of folk theorem for COEDT, though there's a complication in that, conditioning on yourself taking a probability-0 action, you get both worlds where you are being punished and ones where you are punishing someone else, which might mean counterfactual punishments can't be maximal for everyone at once (yay?).

COEDT ensures that conditionals on each action exist at all, but it doesn't ensure that agents behave even remotely sanely in these conditionals, as it's still conditioning on a very rare event, and the relevant rationality conditions permit agents to behave insanely with very small probability. What would be really nice is to get some set of conditional beliefs under which:

no one takes any strictly dominated actions with nonzero probability (i.e. an action such that all possible worlds where the agent takes this action are worse than all possible worlds where the agent doesn't)
conditional on any subset of the agents taking non-strictly-dominated actions, no agent takes any strictly dominated action with nonzero probability

(I suspect this is easier for common-payoff problems; for non-common-payoff problems, agents might take strictly dominated actions as a form of extortion)

COEDT doesn't get this but perhaps a similar construction (maybe using the hyperreals?) does.

[-]jessicata7y20

This assumption seems sketchy to me. In particular, what if you make 2 copies of a deterministic agent, move them physically far from each other, give them the same information, and ask each to select an action? Clearly, if a rational agent is uncertain about either agent's action, then they will believe the two agents' actions to be (perfectly) correlated. The two actions can't each be children of each other...

[-]abramdemski7y10

I maybe should have clarified that when I say CDT I'm referring to a steel-man CDT which would use some notion of logical causality. I don't think the physical counterfactuals are a live hypothesis in our circles, but several people advocate reasoning which looks like logical causality.

Implementability asserts that you should think of yourself as logico-causally controlling your clone when it is a perfect copy.

AI ALIGNMENT FORUM
AF