Operationalizing FDT

Vivek Hebbar

This post is an attempt to better operationalize FDT (functional decision theory). It answers the following questions:

given a logical causal graph, how do we define the logical do-operator?
what is logical causality and how might it be formalized?
how does FDT interact with anthropic updating?
why do we need logical causality? why FDT and not EDT?

Defining the logical do-operator

Consider Parfit's hitchhiker:

A logical causal graph for Parfit's hitchhiker, where blue nodes are logical facts

An FDT agent is supposed to reason as follows:

I am deciding the value of the node "Does my algorithm pay?"
If I set that node to "yes", then omega will save me and I will get +1000 utility. Also I will pay and lose 1 utility. Total is +999.
If I set that node to "no", then omega will not save me. I will get 0 utility.
Therefore I choose to pay.

The bolded phrases are invoking logical counterfactuals. Because I have drawn a "logical causal graph", I will call the operation which generates these counterfactuals a "logical do-operator" by analogy to the do-operator of CDT.

In ordinary CDT, it is impossible to observe a variable that is downstream of your action, because such a variable is in your future. Therefore, the following two definitions of the CDT do-operator are equivalent:

Cut the incoming connections of the action node, and then condition on it having the value X.
Cut the incoming connections of the action node and forget the values of all nodes downstream of it, then condition on the action node having the value X.

For physical causality these are equivalent because the downstream nodes are in the future, so we can't have observed them and there is nothing to forget.

However, in our logical causal graph, we can observe the node "I am in town" even though it is downstream of our action node ("Does my algorithm pay"). So for FDT these two definitions are not equivalent and we need to pick one.

If we want to pay in Parfit's hitchhiker, we must choose definition 2. This allows us to "forget" our observation that we are already in town.

There's one more interesting choice we could consider for the logical do-operator -- we could have it forget future nodes but not cut incoming connections. This would make it an EDT variant with "logicausal un-updating" rather than a logicausal version of CDT.

We can see all of our options in the following 2x2 table:

	Don't forget downstream nodes	Forget downstream nodes
Cut incoming connections	option 1: cut incoming then condition	option 2: cut incoming and forget downstream, then condition
Don't cut incoming connections	option 3: just condition	option 4: forget downstream then condition

Option 3 is just EDT. Options 1 and 3 are missing the "forget downstream nodes" step, so they don't pay in Parfit's hitchhiker without commitment-style updatelessness.

If we want FDT to "automatically" pay in Parfit's hitchhiker, we must choose between options 2 and 4. I personally think it's unclear which of these to prefer. The main disagreements are:

Option 4 pays in logical XOR blackmail, while option 2 doesn't.
Option 4 seems more likely to have ECL-like behavior.

Where does this leave us? So far we have deduced one property of the logical do-operator -- in order to pay in Parfit's hitchhiker without commitments, it must forget the values of downstream nodes. But we have not yet answered the following questions:

How do we construct the logical causal ("logicausal") graph in the first place? What does it even mean?
How do we forget the value of a logical fact without forgetting its causal relationship to other facts?

The next section will try to answer these questions.

Logical causality

There is an intuition that some logical facts cause others, and that unlike "correlation" or "relatedness" this relationship is both directional and intuitively "causal". I think this intuition is pointing to something real and can be defined using conditional independence (the same way that ordinary causal graphs are defined).^[1]

First, let me list a few examples to ground the discussion:

In Newcomb's problem, the logical fact "My algorithm one-boxes" causes the fact "Omega puts a million dollars in box A", not the other way around.
The logical fact "" causes the fact "there are exactly two cases where a positive power of 3 and a positive power of 5 are 2 apart", not the other way around.

Causal graphs are a particular way of encoding a joint distribution over a set of variables. Even if we don't care about "causal intervention", causal graphs are useful because they tell us when variables will be conditionally independent of each other.

My guess is that these conditional dependence rules are deeply related to why causality exists as a concept, and are therefore a sufficient grounding for logical causality.^[2]

This naturally leads to a few proposals for how to define logical causal graphs, which I'll go through one by one.

Causality as derived from a world model

Bounded agents need to choose actions despite logical uncertainty. Therefore it is reasonable to demand that our agent includes a logical world model which can produce joint distributions over logical facts. You might hope that we could stop here: Once we have a joint distribution over some variables, we can check all the conditional dependencies and approximately pin down the causal graph.

However, there is a problem: To infer causality from joint distributions, we need to not already know the values of those variables. So to properly define logical causality, we will also need a way to "forget" the values of logical facts or to rewind to a time before we knew them. This is an additional piece of structure which a reasonable world model might not have already supported.

I don't have a specific proposal for how to do this, but there might be janky ways to do it in practice once you specify the epistemic part of your agent. You might literally rewind to a past state or have some way of erasing logical arguments.

Logical inductors

An obvious extension of the previous idea is to use a logical inductor as the logical world model, rather than leaving it unspecified. In this case, we might use the logical inductor's timestep for our notion of "rewinding to an earlier state where a value was unknown".

Algorithmic mutual information of heuristic arguments

Here's a very different proposal than the previous two:

The algorithmic mutual information of two strings X and Y is defined as , where is the Kolmogorov complexity of a string or set of strings.
Therefore two strings are algorithmically "non-independent" if .
Similarly, we will say that two logical propositions A and B are non-independent if .
Informally speaking: Two facts are non-independent if their shortest proofs share structure.^[3]
However, proofs are in practice unnecessary for reaching high confidence in a statement, so we should instead generalize this to use heuristic arguments.

The great advantage of this proposal is that it lets us easily define the logicausal graph even when we already know all the facts involved, without needing to construct an epistemic state with artificial ignorance. However, I'd want to check that the graphs it gives us match my intuitive notion of logical causality.

How does FDT interact with anthropic updating?

This section might only make sense to readers already familiar with EDT double-counting.

An unfortunate property of EDT (evidential decision theory) is that it is not compatible with the SIA anthropic update. EDT with SIA "double counts" the update, leading to 4:1 betting odds in sleeping beauty. EDT double-counting can be resolved by foregoing the anthropic update (with a variant of minimum-reference-class SSA called "L-zombie anthropics"). However, this fix leads to other strange consequences and is IMO philosophically suspicious.

The reason for double-counting is that EDT lets a decision "take credit" for the actions of many agents. FDT also allows this, so we should expect the same problem. Some basic analysis suggests that we do in fact have the same problem by default.

It's remotely possible that the FDT approach will have some elegant solution to this problem but I currently don't see it. So for now I will assume we use the same patch required for EDT:

Never update your logical or empirical credences on observations (this effectively freezes your empirical views to your prior)
Update your logical credences some amount on logical arguments. Maybe also be logically updateless here to some extent.^[4]

An unfortunate property of this patch is that it forces us to draw a distinction between "internally" evaluating logical arguments and using an external calculator. This boundary is arbitrary and ugly. I don't know if a better way exists.

Putting it together: An attempt at operationalizing FDT

Here's my best guess formulation of FDT:

First, construct a logicausal graph:
1. Start by writing down an intuitive guess for the "causal" structure connecting all the logical facts of interest.
2. Then look at the conditional dependencies among any set of logical facts in this logicausal graph, and make sure they follow the expected rules for conditional independence. If they don't match, revise the graph until they do.
3. Definition of "look at the conditional dependencies":
  1. If your actual probabilities for all the statements involved are far from 0 and 1, then your actual joint distribution should satisfy the conditional independence rules.
  2. If some of the probabilities involved are close to 0 or 1 ("you know too much"), then either rewind your logical world model or use "algorithmic mutual information of heuristic arguments" as your definition for mutual independence.
The logical do-operator is defined as follows: Forget all observations of downstream nodes, cut connections with upstream nodes,^[5] then condition on the action node.
Never update on observations (to avoid double-counting). Wagers replace updates such that this mostly "adds up to normality".

A proper formalization would eliminate step 1a and fully specify the construction without appealing to intuition. However, what I've written here is closer to how I would actually analyze a decision problem.

Appendix: Why bother with logical causality?

Why should we bother constructing a decision theory along these lines? Is there any reason to use logical causality instead of highly-updateless EDT?

My view is that decision theories should formalize our true reasons for believing (upon reflection) that a particular decision is correct. We should demand that our decision theory gives the right answer "for the right reason".

I am personally very uncertain whether my intuitions are more evidential or logicausal. In this section, I'll discuss some decision problems which I think are particularly relevant for choosing between decision theories.

Logical XOR blackmail

Suppose you are in some sense "born into" logical XOR blackmail. Very early in your life, before you encountered the idea of updatelessness, you found a proof that "you will die soon XOR you will light $10 on fire". You find, however, that lighting $10 on fire is not logicausally upstream of dying. Do you light the $10 on fire?

I think there are a few resolutions to this:

Reject the hypothetical: It's not clear that this state of affairs can ever occur. In the usual version of XOR blackmail, the agent is empirically updateful, so the epistemic state "you will die soon XOR you will light $10 on fire" could be reached by receiving a message from a trustworthy predictor. However, both EDT and FDT require us to not update on observations (to avoid double-counting). Without empirical updates, it's hard to imagine how you'd end up logically deducing the claim "you will die soon XOR you will light $10 on fire" without a logicausal connection from "lighting $10 on fire" to "dying soon".^[6]
Pay for evidential reasons: If you have strong evidential intuitions, you might simply pay.
Don't pay on account of logical updatelessness: You might feel that the "true reason" to not pay is that you would have wanted to commit to not paying from some prior logical epistemic state (even though you had not adopted updatelessness at the time when you actually inhabited that epistemic state).
1. "Small epistemic state" variant: You might argue that your "epistemic state" is very small and excludes your memories. You might therefore consider yourself to have actually held the ignorant epistemic state when you committed to updatelessness. However, this sort of perspective might be especially likely to lead to the "hydrogen maximization problem".
Don't pay for logicausal reasons: You might feel that the "true reason" to not pay is that lighting $10 on fire does not "cause" you to not die.

Parfit's hitchhiker

Parfit's hitchhiker was already analyzed in the first section of this post. You might either:

Accept that analysis and say that paying is "correct" ex interim due to the logical counterfactual.
Reject the analysis and hold that you should only pay due to commitments or commitment-style updatelessness.

Logical counterfactual mugging

The version of FDT I described in this post does not pay unless commitment-style logical updatelessness is added on separately.

This does not distinguish it from EDT or CDT. However, it might reduce the appeal of my FDT construction relative to a conceivable world where it fully replaces commitment-style logical updatelessness.

Smoking lesion

I think smoking lesion is extremely confusing and I won't attempt to sort it out here. I don't have a clear position on any of the following questions:

Whether the situation described in smoking lesion is even possible for an agent with reasonable epistemics.
Whether EDT smokes.
Whether EDT is even well-specified enough to say whether it smokes or not. We need to consider questions like "what part of my algorithm do I identify with" and "do I know my own algorithm". Different answers to these might lead to different conclusions.

Smoking lesion is obviously an important case for anyone considering EDT as an option.

Various unappealing cases for CDT-with-physical-causality

It is IMO very unfortunate that both FDT and EDT are incompatible with empirical updates. This is the source of a lot of trouble. So why not adopt CDT, which is generally compatible with SIA updates?

Here are the cases which make me skeptical of physically causal CDT:

Defection in twin prisoner's dilemma.^[7]
Two-boxing in Newcomb's problem.^[8]
Failure to do ECL.^[9]
"Best-of-3 problem":
- Suppose you are deterministic. Someone with the ability to simulate you (in a way that anthropically captures you) adopts the following policy:
  - They simulate you 3 times in identical circumstances. Each time, you are faced with a choice to either press the green button or do nothing.
  - Best-of-3 rule: If at least 2 copies of you press the green button, you get $1000.
  - You lose $1 per copy who presses the green button.
- CDT cannot imagine being the copy which "tips over" the best-of-3 condition, so it fails to press the green button.
A slight variant of best-of-3 ("impossible penalty"):
- Same simulation setup as before (you are deterministic, simulated 3 times, and have the option of pressing a button).
- Your simulators adopt the following policy:
  - If at least two of your copies press the button, you get $1000.
  - If exactly one copy presses the button, you lose $1 (the "impossible penalty").
- A CDT agent will decide not to press the button, because the only way for its action to be counterfactual is the {0 -> 1} case. It is prevented from picking the profitable equilibrium because it's scared of a logically impossible penalty.^[10]
A general moneypump against CDT

Obviously these arguments are correlated: If you aren't bothered by one, you're less likely to be bothered by the others. I'm sure there are physical-CDT proponents who are willing to bite or dispute all these bullets.

Acknowledgements

Thanks to Thomas Kwa, Jeremy Gillen, and Lukas Finnveden for recent discussion, to Nate Soares for related discussions long ago, and to Chi Nguyen for feedback.

^{^}
You might object that causal graphs are not uniquely pinned down by conditional independence observations. From a CDT-ish philosophical perspective, these statistically equivalent causal graphs are not the same because they make distinct claims about CDT counterfactuals.
However, I think that conditional independence rules are the only actual content of causality that I care about and will be sufficient to make logical causality a usable concept. Paul articulates a similar view in this post.
^{^}
The biggest obstacle to this claim is that conditional dependence relations don't fully specify causal graphs -- there can be multiple graphs corresponding to the same joint distribution but which behave differently under causal intervention.
However, actual causal graphs are very rich. So if X infact does not cause Y we can probably find a dependence relation which proves that (see this post).
^{^}
Even more loosely: Two facts are non-independent if having a proof of A helps you write a proof of B.
^{^}
Because our logical do-operator is capable of "undoing" logical facts that are downstream of our actions, FDT doesn't need logical updatelessness to pay in Parfit's hitchhiker. I think it still needs it to pay in logical counterfactual mugging.
^{^}
Or don't cut, if you preferred "option 4" in this section.
^{^}
On the other hand, we might refuse to reject the hypothetical even if it turns out to be logically impossible. On this view, our judgements about what is "correct" shouldn't rely on the fact that a certain conceivable scenario turns out to be impossible.
^{^}
Caveat: You can patch future cases with a son-of-CDT commitment. IMO this is not the "true" reason you should cooperate, and it is not sensible to defect if you failed to previously "make a commitment" of some sort.
^{^}
(same caveat as before)
^{^}
This is a strictly weaker objection than twin PD. Obviously it only applies if you find ECL intuitively compelling (rather than a neutral or unintuitive consequence of EDT). I personally find ECL compelling in context but can still imagine rejecting it upon reflection.
Compared to twin PD, ECL is more practically relevant for humans and can't be fixed using commitments (since you're "born into it").
^{^}
To be fair, FDT also considers logically-impossible counterfactuals. However, the way it does so in specific cases seems more reasonable to me, while this case seems totally unreasonable in my subjective opinion.

34