Thanks for this!
What is the "hydrogen maximization problem"?
Why do you think that having to be empirically updateless is unfortunate?
Re best of three: if sufficiently precise and 100% free randomization is allowed, I think the optimal policy is ratifiable by CDT. (Though not paying is also ratifiable, and more robustly so.)
Or if your knowledge of the environment does helpful randomization for you (if you're not >99% sure your two copies will take the same action), CDT'll at least press the button. But yeah, interesting problem.
Is the correct policy an equilibrium? Suppose the payoff was 5$, not 1000$. If you all press with probability P, you get: (1-P)^3 of 0, 3P(1-P)^2 of -1, 3P^2(1-P) of 3, and P^3 of 2. Optimal P is 0.8873 for payoff of 2.162.
Now suppose you know your two copies are pressing the button with P=0.8873. You press with probability Q. You get (1-P)^2(1-Q) of 0, 2P(1-P)(1-Q) + (1-P)^2Q of -1, 2P(1-P)Q + P^2(1-Q) of 3, and P^2Q of 2. Optimal Q is 0. If you never press the button, you get 2*0.8873*(1-0.8873) of -1 and 0.8873^2 of 3, which is 2.262.
So if you know your copies are playing the optimal policy for three, you shouldn't press the button :D
I think if others play with probability P, every value of Q is equally good.
you get 2*0.8873*(1-0.8873) of -1 and 0.8873^2 of 3, which is 2.262.
Not sure if this is a typo, but I get 2*0.8873*(1-0.8873)(-1)+0.8873^{2}(3) = 2.162
Which is the same as if you play Q=P. Which supports the claim that every value of Q is equally good.
I did say "suppose you are deterministic". That said, can you spell out how CDT ratifies the optimal policy if randomization is allowed?
I believe it follows from this proof: https://www.alignmentforum.org/posts/5bd75cc58225bf06703751b2/in-memoryless-cartesian-environments-every-udt-policy-is-a
This post is an attempt to better operationalize FDT (functional decision theory). It answers the following questions:
Defining the logical do-operator
Consider Parfit's hitchhiker:
An FDT agent is supposed to reason as follows:
The bolded phrases are invoking logical counterfactuals. Because I have drawn a "logical causal graph", I will call the operation which generates these counterfactuals a "logical do-operator" by analogy to the do-operator of CDT.
In ordinary CDT, it is impossible to observe a variable that is downstream of your action, because such a variable is in your future. Therefore, the following two definitions of the CDT do-operator are equivalent:
For physical causality these are equivalent because the downstream nodes are in the future, so we can't have observed them and there is nothing to forget.
However, in our logical causal graph, we can observe the node "I am in town" even though it is downstream of our action node ("Does my algorithm pay"). So for FDT these two definitions are not equivalent and we need to pick one.
If we want to pay in Parfit's hitchhiker, we must choose definition 2. This allows us to "forget" our observation that we are already in town.
There's one more interesting choice we could consider for the logical do-operator -- we could have it forget future nodes but not cut incoming connections. This would make it an EDT variant with "logicausal un-updating" rather than a logicausal version of CDT.
We can see all of our options in the following 2x2 table:
Option 3 is just EDT. Options 1 and 3 are missing the "forget downstream nodes" step, so they don't pay in Parfit's hitchhiker without commitment-style updatelessness.
If we want FDT to "automatically" pay in Parfit's hitchhiker, we must choose between options 2 and 4. I personally think it's unclear which of these to prefer. The main disagreements are:
Where does this leave us? So far we have deduced one property of the logical do-operator -- in order to pay in Parfit's hitchhiker without commitments, it must forget the values of downstream nodes. But we have not yet answered the following questions:
The next section will try to answer these questions.
Logical causality
There is an intuition that some logical facts cause others, and that unlike "correlation" or "relatedness" this relationship is both directional and intuitively "causal". I think this intuition is pointing to something real and can be defined using conditional independence (the same way that ordinary causal graphs are defined).[1]
First, let me list a few examples to ground the discussion:
Causal graphs are a particular way of encoding a joint distribution over a set of variables. Even if we don't care about "causal intervention", causal graphs are useful because they tell us when variables will be conditionally independent of each other.
My guess is that these conditional dependence rules are deeply related to why causality exists as a concept, and are therefore a sufficient grounding for logical causality.[2]
This naturally leads to a few proposals for how to define logical causal graphs, which I'll go through one by one.
Causality as derived from a world model
Bounded agents need to choose actions despite logical uncertainty. Therefore it is reasonable to demand that our agent includes a logical world model which can produce joint distributions over logical facts. You might hope that we could stop here: Once we have a joint distribution over some variables, we can check all the conditional dependencies and approximately pin down the causal graph.
However, there is a problem: To infer causality from joint distributions, we need to not already know the values of those variables. So to properly define logical causality, we will also need a way to "forget" the values of logical facts or to rewind to a time before we knew them. This is an additional piece of structure which a reasonable world model might not have already supported.
I don't have a specific proposal for how to do this, but there might be janky ways to do it in practice once you specify the epistemic part of your agent. You might literally rewind to a past state or have some way of erasing logical arguments.
Logical inductors
An obvious extension of the previous idea is to use a logical inductor as the logical world model, rather than leaving it unspecified. In this case, we might use the logical inductor's timestep for our notion of "rewinding to an earlier state where a value was unknown".
Algorithmic mutual information of heuristic arguments
Here's a very different proposal than the previous two:
The great advantage of this proposal is that it lets us easily define the logicausal graph even when we already know all the facts involved, without needing to construct an epistemic state with artificial ignorance. However, I'd want to check that the graphs it gives us match my intuitive notion of logical causality.
How does FDT interact with anthropic updating?
This section might only make sense to readers already familiar with EDT double-counting.
An unfortunate property of EDT (evidential decision theory) is that it is not compatible with the SIA anthropic update. EDT with SIA "double counts" the update, leading to 4:1 betting odds in sleeping beauty. EDT double-counting can be resolved by foregoing the anthropic update (with a variant of minimum-reference-class SSA called "L-zombie anthropics"). However, this fix leads to other strange consequences and is IMO philosophically suspicious.
The reason for double-counting is that EDT lets a decision "take credit" for the actions of many agents. FDT also allows this, so we should expect the same problem. Some basic analysis suggests that we do in fact have the same problem by default.
It's remotely possible that the FDT approach will have some elegant solution to this problem but I currently don't see it. So for now I will assume we use the same patch required for EDT:
An unfortunate property of this patch is that it forces us to draw a distinction between "internally" evaluating logical arguments and using an external calculator. This boundary is arbitrary and ugly. I don't know if a better way exists.
Putting it together: An attempt at operationalizing FDT
Here's my best guess formulation of FDT:
A proper formalization would eliminate step 1a and fully specify the construction without appealing to intuition. However, what I've written here is closer to how I would actually analyze a decision problem.
Appendix: Why bother with logical causality?
Why should we bother constructing a decision theory along these lines? Is there any reason to use logical causality instead of highly-updateless EDT?
My view is that decision theories should formalize our true reasons for believing (upon reflection) that a particular decision is correct. We should demand that our decision theory gives the right answer "for the right reason".
I am personally very uncertain whether my intuitions are more evidential or logicausal. In this section, I'll discuss some decision problems which I think are particularly relevant for choosing between decision theories.
Logical XOR blackmail
Suppose you are in some sense "born into" logical XOR blackmail. Very early in your life, before you encountered the idea of updatelessness, you found a proof that "you will die soon XOR you will light $10 on fire". You find, however, that lighting $10 on fire is not logicausally upstream of dying. Do you light the $10 on fire?
I think there are a few resolutions to this:
Parfit's hitchhiker
Parfit's hitchhiker was already analyzed in the first section of this post. You might either:
Logical counterfactual mugging
The version of FDT I described in this post does not pay unless commitment-style logical updatelessness is added on separately.
This does not distinguish it from EDT or CDT. However, it might reduce the appeal of my FDT construction relative to a conceivable world where it fully replaces commitment-style logical updatelessness.
Smoking lesion
I think smoking lesion is extremely confusing and I won't attempt to sort it out here. I don't have a clear position on any of the following questions:
Smoking lesion is obviously an important case for anyone considering EDT as an option.
Various unappealing cases for CDT-with-physical-causality
It is IMO very unfortunate that both FDT and EDT are incompatible with empirical updates. This is the source of a lot of trouble. So why not adopt CDT, which is generally compatible with SIA updates?
Here are the cases which make me skeptical of physically causal CDT:
Obviously these arguments are correlated: If you aren't bothered by one, you're less likely to be bothered by the others. I'm sure there are physical-CDT proponents who are willing to bite or dispute all these bullets.
Acknowledgements
Thanks to Thomas Kwa, Jeremy Gillen, and Lukas Finnveden for recent discussion, to Nate Soares for related discussions long ago, and to Chi Nguyen for feedback.
You might object that causal graphs are not uniquely pinned down by conditional independence observations. From a CDT-ish philosophical perspective, these statistically equivalent causal graphs are not the same because they make distinct claims about CDT counterfactuals.
However, I think that conditional independence rules are the only actual content of causality that I care about and will be sufficient to make logical causality a usable concept. Paul articulates a similar view in this post.
The biggest obstacle to this claim is that conditional dependence relations don't fully specify causal graphs -- there can be multiple graphs corresponding to the same joint distribution but which behave differently under causal intervention.
However, actual causal graphs are very rich. So if X infact does not cause Y we can probably find a dependence relation which proves that (see this post).
Even more loosely: Two facts are non-independent if having a proof of A helps you write a proof of B.
Because our logical do-operator is capable of "undoing" logical facts that are downstream of our actions, FDT doesn't need logical updatelessness to pay in Parfit's hitchhiker. I think it still needs it to pay in logical counterfactual mugging.
Or don't cut, if you preferred "option 4" in this section.
On the other hand, we might refuse to reject the hypothetical even if it turns out to be logically impossible. On this view, our judgements about what is "correct" shouldn't rely on the fact that a certain conceivable scenario turns out to be impossible.
Caveat: You can patch future cases with a son-of-CDT commitment. IMO this is not the "true" reason you should cooperate, and it is not sensible to defect if you failed to previously "make a commitment" of some sort.
(same caveat as before)
This is a strictly weaker objection than twin PD. Obviously it only applies if you find ECL intuitively compelling (rather than a neutral or unintuitive consequence of EDT). I personally find ECL compelling in context but can still imagine rejecting it upon reflection.
Compared to twin PD, ECL is more practically relevant for humans and can't be fixed using commitments (since you're "born into it").
To be fair, FDT also considers logically-impossible counterfactuals. However, the way it does so in specific cases seems more reasonable to me, while this case seems totally unreasonable in my subjective opinion.