Comparison of decision theories (with a focus on logical-counterfactual decision theories)

Various comments, written while reading:

The broad categories of causal/evidential/logical are definitely right in terms of what people generally talk about, but it is important to keep in mind that these are clusters rather than fully formalized options. There are many different formalizations of causal counterfactuals, which may have significantly different consequences. Though, around here, people think of Pearlian causality almost exclusively.

"Evidential" means basically one thing, but we can differentiate between what happens in different theories of uncertainty. Obviously, Bayesianism is popular in these parts, but we also might be talking about evidential reasoning in a logically uncertain framework, like logical induction.

Logical counterfactuals are wide open, since there's no accepted account of what exactly they are. Though, modal DT is a concrete proposal which is often discussed.

Again, the causal/evidential/logical split seems good for capturing how people mostly talk about things here, but internally I think of it more as two dimensions: causal/evidential and logical/not. Logical counterfactuals are more or less the "causal and logical" option, conveying intuitions of there being some kind of "logical causality" which tells you how to take counterfactuals.

Also, getting into nitpicks: some might say "evidential" is the non-counterfactual option. A broader term which could be used is "conditional", with counterfactual conditionals (aka subjunctive conditionals) being a subtype. I think evidential conditionals would fall under "indicative conditional" as opposed to "counterfactual conditional". Academic philosophers might also nitpick that logical counterfactuals are not counterfactuals. "Counterfactual" in academic philosophy usually does not include the possibility of counterfacting on logical impossibilities; "counterlogical" is used when logical impossibilities are being considered. Posts on this forum usually ignore all the nitpics in this paragraph, and I'm not sure I'm even capturing the language of academic decision theorists accurately -- just attempting to mention some distinctions I've encountered.

Other Dimensions:

You're right that reflective consistency is something which is supposed to emerge (or not emerge) from the specification of the decision theory. If there were a 'reflective consistency' option, we would want to just set it to 'yes'; but unfortunately, things are not so easy.

Another source of variation, related to your 'graphical models' point, could broadly be called choice of formalism. A decision problem could be given as an extensive-form game, a causal Bayes net, a program (probabilistic or deterministic), a logical theory (with some choices about how actions, utilities, etc get represented, whether causality needs to be specified, and so on), or many other possibilities.

This is critical; new formalisms such as reflective oracles may allow us to accomplish new things, illuminate problems which were previously murky, make distinctions between things which were previously being conflated, and so on. However, the high-level clusters like CDT, EDT, FDT, and UDT do not specify formalism -- they are more general ideas, which can be formalized in multiple ways.

[-]Alex Flint4mo40

Hey I'm interested in implementing some of these decision theories (and decision problems) in code. I have an initial version of CDT, EDT, and something I'm generically calling "FDT", but which I guess is actually some particular sub-variant of FDT in Python here, with the core decision theories implemented in about 45 lines of python code here. I'm wondering if anyone here might have suggestions on what would it look like to implement UDT in this framework -- either 1.0 or 1.1. I don't yet have a notion of "observation" in the code, so I can't yet implement e.g. Parfit's Hitchiker or XOR blackmail. I'm interested in suggestions on what that would look like.

Any other comments or suggestions also much appreciated. I hope to turn this into a top-level post after implementing more decision problems and theories, and getting more feedback.

[-]Chris_Leong7y40

You may find this comment that Rob Bensinger left on one of my questions interesting:

"The main datapoint that Rob left out: one reason we don't call it UDT (or cite Wei Dai much) is that Wei Dai doesn't endorse FDT's focus on causal-graph-style counterpossible reasoning; IIRC he's holding out for an approach to counterpossible reasoning that falls out of evidential-style conditioning on a logically uncertain distribution. (FWIW I tried to make the formalization we chose in the paper general enough to technically include that possibility, though Wei and I disagree here and that's definitely not where the paper put its emphasis. I don't want to put words in Wei Dai's mouth, but IIRC, this is also a reason Wei Dai declined to be listed as a co-author.)"

Rob also left another comment explaining the renaming from UDT to FDT

[-]Wei Dai7y80

Chris asked me via PM, "I’m curious, have you written any posts about why you hold that position?"

I don't think I have, but I'll give the reasons here:

"evidential-style conditioning on a logically uncertain distribution" seems simpler / more elegant to me.
I'm not aware of a compelling argument for "causal-graph-style counterpossible reasoning". There are definitely some unresolved problems with evidential-style UDT and I do endorse people looking into causal-style FDT as an alternative but I'm not convinced the solutions actually lie in that direction. (https://sideways-view.com/2018/09/30/edt-vs-cdt-2-conditioning-on-the-impossible/ and links therein are relevant here.)
Part of it is just historical, in that UDT was originally specified as "evidential-style conditioning on a logically uncertain distribution" and if I added my name as a co-author to a paper that focuses on causal-style decision theory, people would naturally wonder if something made me change my mind.

[-]Chris_Leong7y20

I'm actually still quite confused by the necessity of logical uncertainty for UDT. Most of the common problems like Newcomb's or Parfit's Hitchhiker don't seem to require it. Where does it come in?

(The only reference to it that I could find was on the LW wiki)

[-]abramdemski7y60

You can formalize UDT in a more standard game-theoretic setting, which allows many problems like Parfit's Hitchhiker to be dealt with, if that is enough for what you're interested in. However, the formalism assumes a lot about the world (such as the identity of the agent being a nonproblematic given, as Wei Dai mentions), so if you want to address questions of where that structure is coming from, you have to do something else.

[-]Wei Dai7y50

I think it's needed just to define what it means to condition on an action, i.e., if an agent conditions on "I make this decision" in order to compute its expected utility, what does that mean formally? You could make "I" a primitive element in the agent's ontology, but I think that runs into all kinds of problems. My solution was to make it a logical statement of the form "source code X outputs action/policy Y", and then to condition on it you need a logically uncertain distribution.

[-]Chris_Leong7y20

Hmm, I'm still confused. I can't figure out why we would need logical uncertainty in the typical case to figure out the consequences of "source code X outputs action/policy Y". Is there a simple problem where this is necessary or is this just a result of trying to solve for the general case?

[-]Rob Bensinger7y40

Agents need to consider multiple actions and choose the one that has the best outcome. But we're supposing that the code representing the agent's decision only has one possible output. E.g., perhaps an agent is going to choose between action A and action B, and will end up choosing A. Then a sufficiently close examination of the agent's source code will reveal that the scenario "the agent chooses B" is logically inconsistent. But then it's not clear how the agent can reason about the desirability of "the agent chooses B" while evaluating its outcomes, if not via some mechanism for nontrivially reasoning about outcomes of logically inconsistent situations.

[-]Chris_Leong7y10

Do we need the ability to reason about logically inconsistent situations? Perhaps we could attempt to transform the question of logical counterfactuals into a question about consistent situations instead as I describe in this post? Or to put it another way, is the idea of logical counterfactuals an analogy or something that is supposed to be taken literally?

[-]Wei Dai7y20

See "Example 1: Counterfactual Mugging" in Towards a New Decision Theory.

[-]Rob Bensinger7y30

The comment starting "The main datapoint that Rob left out..." is actually by Nate Soares. I cross-posted it to LW from an email conversation.

	UDT1.1/FDT-policy	UDT1/FDT-action	TDT	EDT	CDT
UDT1.1/FDT-policy	–	Number assignment problem described in the UDT1.1 post (both UDT1 copies output “A”, the UDT1.1 copies output “A” and “B”)	Counterfactual mugging (a.k.a. curious benefactor) (TDT refuses, UDT1.1 pays)	Parfit’s hitchhiker (EDT refuses, UDT1.1 pays)	Newcomb’s problem (CDT two-boxes, UDT1.1 one-boxes)
UDT1/FDT-action	–	–	Counterfactual mugging (a.k.a. curious benefactor) (TDT refuses, UDT1 pays)	Parfit’s hitchhiker (EDT refuses, UDT1 pays)	Newcomb’s problem (CDT two-boxes, UDT1 one-boxes)
TDT	–	–	–	Parfit’s hitchhiker (EDT refuses, TDT pays)	Newcomb’s problem (CDT two-boxes, TDT one-boxes)
EDT	–	–	–	–	Newcomb’s problem (CDT two-boxes, EDT one-boxes)
CDT	–	–	–	–	–

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

25

Comparison of decision theories (with a focus on logical-counterfactual decision theories)

25

Introduction

Summary

Value-added

Audience

Comparison dimensions

Outermost iteration

Updatelessness

Type of counterfactual

Other dimensions that I ignore

Comparison table along the given dimensions

Explanations of each decision theory

UDT1 and FDT (iterate over actions)

UDT1.1 and FDT (iterate over policies)

TDT

UDT2

LDT

CDT

EDT

Comparison on specific decision problems

Other comparisons

Decision theory	Outermost iteration	Updateless	Type of counterfactual
Updateless decision theory 1 (UDT1)	action	yes	logical
Updateless decision theory 1.1 (UDT1.1)	policy	yes	logical
Updateless decision theory 2 (UDT2)	algorithm	yes	logical
Functional decision theory, iterating over actions (FDT-action)	action	yes	logical
Functional decision theory, iterating over policies (FDT-policy)	policy	yes	logical
Logical decision theory (LDT)	unspecified	unspecified	logical
Timeless decision theory (TDT)	action	no	logical
Causal decision theory (CDT)	action	no	causal
Evidential decision theory (EDT, “naive EDT”)	action	no	conditional