Predictors exist: CDT going bonkers... forever

Stuart_Armstrong

18 Predictors exist: CDT going bonkers... forever

14th Jan 2020

1 min read

18

I've been wanting to get a better example of CDT (causal decision theory) misbehaving, where the behaviour is more clearly suboptimal than it is in the Newcomb problem (which many people don't seem to accept as CDT being suboptimal), and simpler to grasp than Death in Damascus.

The "predictors exist" problem

So consider this simple example: the player is playing against Omega, who will predict their actions^[1]. The player can take three actions: "zero", "one", or "leave".

If ever they do "leave", then the experiment is over and they leave. If they choose "zero" or "one", then Omega will predict their action, and compare this to their actual action. If the two match, then the player loses $1$ utility and the game repeats; if the action and the prediction differs, then the player gains $3$ utility and the experiment ends.

Assume that actually Omega is a perfect or quasi-perfect predictor, with a good model of the player. An FDT or EDT agent would soon realise that they couldn't trick Omega, after a few tries, and would quickly end the game.

But the CDT player would be incapable of reaching this reasoning. Whatever distribution they compute over Omega's prediction, they will always estimate that they (the CDT player) have at least a $50 %$ chance of choosing the other option^[2], for an expected utility gain of at least $0.5 (3) + 0.5 (- 1) = 1$ .

Basically, the CDT agent can never learn that Omega is a good predictor of themselves^[3]. And so they will continue playing, and continue losing... for ever.

Omega will make this prediction not necessarily before the player takes their action, not even necessarily without seeing this action, but still makes the prediction independently of this knowledge. And that's enough for CDT. ↩︎
For example, suppose the CDT agent estimates the prediction will be "zero" with probability $p$ , and "one" with probability 1-p. Then if $p \geq 1 / 2$ , they can say "one", and have a probability $p \geq 1 / 2$ of winning, in their own view. If $p < 1 / 2$ , they can say "zero", and have a subjective probability $1 - p > 1 / 2$ of winning. ↩︎
The CDT agent has no problem believing that Omega is a perfect predictor of other agents, however. ↩︎

Decision theory

Frontpage

Mentioned in

18ACDT: a hack-y acausal decision theory

12Extracting Money from Causal Decision Theorists

10[AN #83]: Sample-efficient deep learning with ReMixMatch

Predictors exist: CDT going bonkers... forever

New Comment

7 comments, sorted by

top scoring

Click to highlight new comments since: Today at 2:51 PM

[-]Dagon6y40

[note: this is bugging me more than it should. I really don't get why this is worth so much repetition of examples that don't show anything new.]

I'll admit I'm one of those who doesn't see CDT as hopeless. It takes a LOT of hypothetical setup to show cases where it fails, and neither newcomb nor this seem to be as much about decision theory as about free will.

Part of this is my failing. I keep thinking CDT is "classical decision theory", and it means "make the best conditional predictions you can, and then maximize your expected value. This is very robust, but describes all serious decision theories. The actual discussion is about "causal decision theory", and there are plenty of failure cases, where the agent has a flawed model of causality.

But for some reason, we can't just say "incorrect causal models make bad predictions" and move on. We keep bringing up really contrived cases where a naive agent, which we label CDT, makes bad conditional predictions, and it's not clear why they're so stupid as to not notice. I don't know ANYONE who claims an agent should make and act on incorrect predictions.

For your newcomb-like example (and really, any Omega causality violation), I assert that a CDT agent could notice outcomes and apply bayes' theorem to the chance that they can trick Omega just as well as any other DT. Assuming that Omega is cheating, and changing the result after my choice is sufficient to get the right answer.

Cases of mind-reading and the like are similarly susceptible to better causality models - recognizing that the causality is due to the agent's intent, not their actions, makes CDT recognize that to the extent it can control the intent, it should.

Your summary includes " the CDT agent can never learn this", and that seems the crux. To me, not learning something means that _EITHER_ CDT agent is a strawman that we shouldn't spend so much time on, _OR_ this is something that cannot be true, and it's probably good if agents can't learn it. If you tell me that a Euclidian agent knows pi and can accurately make wagers on the circumference of a circle knowing only it's diameter, but it's flawed because a magic being puts it on a curved surface and it never re-considers that belief, I'm going to shrug and say "okay... but here in flatland that doesn't happen". It doesn't matter how many thought experiments you come up with to show counterfactual cases where C/D is different for a circle, you're completely talking past my objection that Euclidian decision theory is simple and workable for actual use.

To summarize my confusion, does CDT require that the agent unconditionally believe in perfect free will independent of history (and, ironically, with no causality for the exercise of will)? If so, that should be the main topic of dispute - the frequency of actual case where it makes bad predictions, not that it makes bad decisions in ludicrously-unlikely-and-perhaps-impossible situations.

[-]Daniel Kokotajlo6y70

To summarize my confusion, does CDT require that the agent unconditionally believe in perfect free will independent of history (and, ironically, with no causality for the exercise of will)? If so, that should be the main topic of dispute - the frequency of actual case where it makes bad predictions, not that it makes bad decisions in ludicrously-unlikely-and-perhaps-impossible situations.

Sorta, yes. CDT requires that you choose actions not by thinking "conditional on my doing A, what happens?" but rather by some other method (there are different variants) such as "For each causal graph that I think could represent the world, what happens when I intervene (in Pearl's sense) on the node that is my action, to set it to A?)" or "Holding fixed the probability of all variables not causally downstream of my action, what happens if I do A?"

In the first version, notice that you are choosing actions by imagining a Pearl-style intervention into the world--but this is not something that actually happens; the world doesn't actually contain such interventions.

In the second version, well, notice that you are choosing actions by imagining possible scenarios that aren't actually possible--or at least, you are assigning the wrong probabilities to them. ("holding fixed the probability of all variables not causally downstream of my action...")

So one way to interpret CDT is that it believes in crazy stuff like hardcore incompatibilist free will. But the more charitable way to interpret it is that it doesn't believe in that stuff, it just acts as if it does, because it thinks that's the rational way to act. (And they have plenty of arguments for why CDT is the rational way to act, e.g. the intuition pump "If the box is already either full or empty and you can't change that no matter what you do, then no matter what you do you'll get more money by two-boxing, so..."

[-]Stuart_Armstrong6y30

[-]Daniel Kokotajlo6y30

Well said.

I had a similar idea a while ago and am working it up into a paper ("CDT Agents are Exploitable"). Caspar Oesterheld and Vince Conitzer are also doing something like this. And then there is Ahmed's Betting on the Past case.

In their version, the Predictor offers bets to the agent, at least one of which the agent will accept (for the reasons you outline) and thus they get money-pumped. In my version, there is no Predictor, but instead there are several very similar CDT agents, and a clever human bookie can extract money from them by exploiting their inability to coordinate.

Long story short, I would bet that an actual AGI which was otherwise smarter than me but which doggedly persisted in doing its best to approximate CDT would fail spectacularly one way or another, "hacked" by some clever bookie somewhere (possibly in its hypothesis space only!). Unfortunately, arguably the same is true for all decision theories I've seen so far, but for different reasons...

[-]Caspar Oesterheld6y20

>Caspar Oesterheld and Vince Conitzer are also doing something like this

That paper can be found at https://users.cs.duke.edu/~ocaspar/CDTMoneyPump.pdf . And yes, it is structurally essentially the same as the problem in the post.

[-]Stuart_Armstrong6y20

Cool!

I notice that you assumed there were no independent randomising devices available. But why would the CDT agent ever opt to use a randomising device? Why would it see that as having value?

[-]Caspar Oesterheld5y10

Apologies, I only saw your comment just now! Yes, I agree, CDT never strictly prefers randomizing. So there are agents who abide by CDT and never randomize. As our scenarios show, these agents are exploitable. However, there could also be CDT agents who, when indifferent between some set of actions (and when randomization is not associated with any cost), do randomize (and choose the probability according to some additional theory -- for example, you could have the decision procedure: "follow CDT, but when indifferent between multiple actions, choose a distribution over these actions that is ratifiable".). The updated version of our paper -- which has now been published Open Access in The Philosophical Quarterly -- actually contains some extra discussion of this in Section IV.1, starting with the paragraph "Nonetheless, what happens if we grant the buyer in Adversarial Offer access to a randomisation device...".

Moderation Log

Curated and popular this week

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

18

Predictors exist: CDT going bonkers... forever

18

The "predictors exist" problem