A Critique of Functional Decision Theory

I saw an earlier draft of this, and hope to write an extensive response at some point. For now, the short version:

As I understand it, FDT was intended as an umbrella term for MIRI-style decision theories, which illustrated the critical points without making too many commitments. So, the vagueness of FDT was partly by design.

I think UDT is a more concrete illustration of the most important points relevant to this discussion.

The optimality notion of UDT is clear. "UDT gets the most utility" means "UDT gets the highest expected value with respect to its own prior". This seems quite well-defined, hopefully addressing your (VII).

There are problems applying UDT to realistic situations, but UDT makes perfect sense and is optimal in a straightforward sense for the case of single-player extensive form games. That doesn't address multi-player games or logical uncertainty, but it is enough for much of Will's discussion.
FDT focused on the weird logical cases, which is in fact a major part of the motivation for MIRI-style decision theory. However, UDT for single-player extensive-form games actually gets at a lot of what MIRI-style decision theory wants, without broaching the topic of logical counterfactuals or proving-your-own-action directly.
The problems which create a deep indeterminacy seem, to me, to be problems for other decision theories than FDT as well. FDT is trying to face them head-on. But there are big problems for applying EDT to agents who are physically instantiated as computer programs and can prove too much about their own actions.

This also hopefully clarifies the sense in which I don't think the decisions pointed out in (III) are bizarre. The decisions are optimal according to the very probability distribution used to define the decision problem.

There's a subtle point here, though, since Will describes the decision problem from an updated perspective -- you already know the bomb is in front of you. So UDT "changes the problem" by evaluating "according to the prior". From my perspective, because the very statement of the Bomb problem suggests that there were also other possible outcomes, we can rightly insist to evaluate expected utility in terms of those chances.
Perhaps this sounds like an unprincipled rejection of the Bomb problem as you state it. My principle is as follows: you should not state a decision problem without having in mind a well-specified way to predictably put agents into that scenario. Let's call the way-you-put-agents-into-the-scenario the "construction". We then evaluate agents on how well they deal with the construction.

For examples like Bomb, the construction gives us the overall probability distribution -- this is then used for the expected value which UDT's optimality notion is stated in terms of.
For other examples, as discussed in Decisions are for making bad outcomes inconsistent, the construction simply breaks when you try to put certain decision theories into it. This can also be a good thing; it means the decision theory makes certain scenarios altogether impossible.

The point about "constructions" is possibly a bit subtle (and hastily made); maybe a lot of the disagreement will turn out to be there. But I do hope that the basic idea of UDT's optimality criterion is actually clear -- "evaluate expected utility of policies according to the prior" -- and clarifies the situation with FDT as well.

[-]abramdemski6y190

Replying to one of Will's edits on account of my comments to the earlier draft:

Finally, in a comment on a draft of this note, Abram Demski said that: “The notion of expected utility for which FDT is supposed to do well (at least, according to me) is expected utility with respect to the prior for the decision problem under consideration.” If that’s correct, it’s striking that this criterion isn’t mentioned in the paper. But it also doesn’t seem compelling as a principle by which to evaluate between decision theories, nor does it seem FDT even does well by it. To see both points: suppose I’m choosing between an avocado sandwich and a hummus sandwich, and my prior was that I prefer avocado, but I’ve since tasted them both and gotten evidence that I prefer hummus. The choice that does best in terms of expected utility with respect to my prior for the decision problem under consideration is the avocado sandwich (and FDT, as I understood it in the paper, would agree). But, uncontroversially, I should choose the hummus sandwich, because I prefer hummus to avocado.

Yeah, the thing is, the FDT paper focused on examples where "expected utility according to the prior" becomes an unclear notion due to logical uncertainty issues. It wouldn't have made sense for the FDT paper to focus on that, given the desire to put the most difficult issues into focus. However, FDT is supposed to accomplish similar things to UDT, and UDT provides the more concrete illustration.

The policy that does best in expected utility according to the prior is the policy of taking whatever you like. In games of partial information, decisions are defined as functions of information states; and in the situation as described, there are separate information states for liking hummus and liking avocado. Choosing the one you like achieves a higher expected utility according to the prior, in comparison to just choosing avocado no matter what. In this situation, optimizing the decision in this way is equivalent to updating on the information; but, not always (as in transparent newcomb, Bomb, and other such problems).

To re-state that a different way: in a given information state, UDT is choosing what to do as a function of the information available, and judging the utility of that choice according to the prior. So, in this scenario, we judge the expected utility of selecting avocado in response to liking hummus. This is worse (according to the prior!) than selecting hummus in response to liking hummus.

[-]Wei Dai6y*180

It seems important to acknowledge that there's a version of the Bomb argument that actually works, at least if we want to apply UDT to humans as opposed to AIs, and this may be part of what's driving Will's intuitions. (I'll use "UDT" here because that's what I'm more familiar with, but presumably everything transfers to FDT.)

First there's an ambiguity in Bomb as written, namely what does my simulation see? Does it see a bomb in Left, or no bomb? Suppose the setup is that the simulation sees no bomb in Left. In that case since obviously I should take Left when there's no bomb in it (and that's what my simulation would do), if I am seeing a bomb in Left it must mean I'm in the 1 in a trillion trillion situation where the predictor made a mistake, therefore I should (intuitively) take Right. UDT also says I should take Right so there's no problem here.

Now suppose the simulation is set up to see a bomb in Left. In that case, when I see a bomb in Left, I don't know if I'm a simulation or a real person. If I was selfish in an indexical way, I would think something like "If I'm a simulation then it doesn't matter what I choose. The simulation will end as soon as I make a choice so my choice is inconsequential. But if I'm a real person, choosing Left will cause me to be burned. So I should choose Right." The thing is, UDT is incompatible with this kind of selfish values, because UDT takes a utility function that is defined over possible histories of the world and not possible centered histories of the world (i.e., histories with an additional pointer that says this is "me"). UDT essentially forces an agent to be altruistic to its copies, and therefore is unable to give the intuitively correct answer in this case.

If we're doing decision theory for humans, then the incompatibility with this kind of selfish values would be a problem because humans plausibly do have this kind of selfish values as part of our complex values and whatever decision theory we use perhaps should be able to handle it. However if we're building an AI, it doesn't seem to make sense to let it have selfish values (i.e., have a utility function over centered histories as opposed to uncentered histories), so UDT seems fine (at least as far as this issue is concerned) for thinking about how AIs should ideally make decisions.

[-]AlexMennen6y40

I don't know if I'm a simulation or a real person.

A possible response to this argument is that the predictor may be able to accurately predict the agent without explicitly simulating them. A possible counter-response to this is to posit that any sufficiently accurate model of a conscious agent is necessarily conscious itself, whether the model takes the form of an explicit simulation or not.

[-]Vladimir_Nesov6y30

By the way, selfish values seem related to the reward vs. utility distinction. An agent that pursues a reward that's about particular events in the world rather than a more holographic valuation seems more like a selfish agent in this sense than a maximizer of a utility function with a small-in-space support. If a reward-seeking agent looks for reward channel shaped patterns instead of the instance of a reward channel in front of it, it might tile the world with reward channels or search the world for more of them or something like that.

[-]Ofer6y30

Now suppose the simulation is set up to see a bomb in Left. In that case, when I see a bomb in Left, I don’t know if I’m a simulation or a real person. If I was selfish in an indexical way, I would think something like “If I’m a simulation then it doesn’t matter what I choose. The simulation will end as soon as I make a choice so my choice is inconsequential. But if I’m a real person, choosing Left will cause me to be burned. So I should choose Right.”

It seems to me that even in this example, a person (who is selfish in an indexical way) would prefer—before opening their eyes—to make a binding commitment to choose left. If so, the "intuitively correct answer" that UDT is unable to give is actually just the result of a failure to make a beneficial binding commitment.

[-]Wei Dai6y50

That's true, but they could say, "Well, given that no binding commitment was in fact made, and given my indexically selfish values, it's rational for me to choose Right." And I'm not sure how to reply to that, unless we can show that such indexically selfish values are wrong somehow.

[-]Ofer6y20

I agree. It seems that in that situation the person would be "rational" to choose Right.

I'm still confused about the "UDT is incompatible with this kind of selfish values" part. It seems that an indexically-selfish person—after failing to make a binding commitment and seeing the bomb—could still rationally commit to UDT from that moment on, by defining the utility s.t. only copies that found themselves in that situation (i.e. those who failed to make a binding commitment and saw the bomb) matter. That utility is a function over uncentered histories of the world, and would result in UDT choosing Right.

[-]Wei Dai6y40

I don't see anything wrong with what you're saying, but if you did that you'd end up not being an indexically selfish person anymore. You'd be selfish in a different, perhaps alien or counterintuitive way. So you might be reluctant to make that kind of commitment until you've thought about it for a much longer time, and UDT isn't compatible with your values in the meantime. Also, without futuristic self-modification technologies, you are probably not able to make such a commitment truly binding even if you wanted to and you tried.

[-]Ofer6y10

Some tangentially related thoughts:

It seems that in many simple worlds (such as the Bomb world), an indexically-selfish agent with a utility function $u$ over centered histories would prefer to commit to UDT with a utility function $u^{'}$ over uncentered histories; where $u^{'}$ is defined as the sum of all the "uncentered versions" of $u$ (version $i$ corresponds to $u$ when the pointer is assumed to point to agent $i$ ).

Things seem to get more confusing in messy worlds in which the inability of an agent to define a utility function (over uncentered histories) that distinguishes between agent1 and agent2 does not entail that the two agents are about to make the same decision.

[-]Vaniver6y170

(I work at MIRI, and edited the Cheating Death in Damascus paper, but this comment wasn't reviewed by anyone else at MIRI.)

This should be a constraint on any plausible decision theory.

But this principle prevents you from cooperating with yourself across empirical branches in the world!

Suppose a good predictor offers you a fair coin flip at favorable odds (say, 2 of their dollars to one of yours). If you called correctly, you can either forgive (no money moves) or demand; if you called incorrectly, you can either pay up or back out. The predictor only responds to your demand that they pay up if they predict that you would yourself pay up when you lose, but otherwise this interaction doesn't affect the rest of your life.

You call heads, the coin comes up tails. The Guaranteed Payoffs principle says:

You're certain that you're in a world where you will just lose a dollar if you pay up, and will lose no dollars if you don't pay up. It maximizes utility conditioned on this starting spot to not pay up.

The FDT perspective is to say:

The price of winning $2 in half of the worlds is losing $1 in the other half of the worlds. You want to be the sort of agent who can profit from these sorts of bets and/or you want to take this opportunity to transfer utility across worlds, because it's net profitable.

Note that the Bomb case is one in which we condition on the 1 in a trillion trillion failure case, and ignore the 999999999999999999999999 cases in which FDT saves $100. This is like pointing at people who got into a plane that crashed and saying "what morons, choosing to get on a plane that would crash!" instead of judging their actions from the state of uncertainty that they were in when they decided to get on the plane.

This is what Abram means when he says "with respect to the prior of the decision problem"; not that the FDT agent is expected to do well from any starting spot, but from the 'natural' one. (If the problem statement is as described and the FDT agent sees "you'll take the right box" and the FDT agent takes the left box, then it must be the case that this was the unlucky bad prediction and made unlikely accordingly.) It's not that the FDT agent wanders through the world unable to determine where it is even after obtaining evidence; it's that as the FDT agent navigates the world it considers its impact across all (connected) logical space instead of just immediately downstream of itself. Note that in my coin flip case, FDT is still trying to win the reward when the coin comes up heads even though in this case it came up tails, as opposed to saying "well, every time I see this problem the coin will come up tails, therefore I shouldn't participate in the bet."

[I do think this jump, from 'only consider things downstream of you' to 'consider everything', does need justification and I think the case hasn't been as compelling as I'd like it to be. In particular, the old name for this, 'updatelessness', threw me for a loop for a while because it sounded like the dumb "don't take input from your environment" instead of the conscious "consider what impact you're having on hypothetical versions of yourself".]

But then, it seems to me, that FDT has lost much of its initial motivation: the case for one-boxing in Newcomb’s problem didn’t seem to stem from whether the Predictor was running a simulation of me, or just using some other way to predict what I’d do.

It seems to me like either you are convinced that the predictor is using features you can control (based on whether or not you decide to one-box) or features you can't control (like whether you're English or Scottish). If you think the latter, you two-box (because regardless of whether the predictor is rewarding you for being Scottish or not, you benefit from the $1000), and if you think the former you one-box (because you want to move the probability that the predictor fills the large box).

According to me, the simulation is just a realistic way to instantiate an actual dependence between the decision I'm making now and the prediction. (Like, when we have AIs we'll actually be able to put them in Newcomb-like scenarios!) If you want to posit a different, realistic version of that, then FDT is able to handle it (and the difficulty is all in moving from the English description of the problem to the subjunctive dependency graph).

Now, because there’s an agent making predictions, the FDT adherent will presumably want to say that the right action is one-boxing.

I don't think this is right; I think this is true only if the FDT agent thinks that S (a physically verifiable fact about the world, like the lesion) is logically downstream of its decision. In the simplest such graph I can construct, S is still logically upstream of the decision; are we making different graphs?

But it’s very implausible that there’s some S such that a tiny change in its physical makeup should affect whether one ought to one-box or two-box.

I don't buy this as an objection; decisions are often discontinuous. Suppose I'm considering staying at two different hotels, one with price A and the other with price B with B<A; then construct a series of changes to A that moves it imperceptibly, and at some point my decision switches abruptly from staying at hotel B to staying at hotel A. Whenever you pass multiple continuous quantities through an argmin or argmax, you can get sudden changes.

(Or, put a more analogous way, you can imagine insurance against an event with probability p, and we smoothly vary p, and at some point our action discontinuously jumps from not buying the insurance to buying the insurance.)

[-]Zvi6y60

I am deeply confused how someone who is taking decision theory seriously can accept Guaranteed Payoffs as correct. I'm even more confused how it can seem so obvious that anyone violating it has a fatal problem.

Under certainty, this is assuming CDT is correct, when CDT seems to have many problems other than certainty. We can use Vaniver's examples above, or use a reliable insurance agent to remove any uncertainty, or we also can use any number of classic problems without any uncertainty (or remove it), and see that such an agent loses - e.g. Parfit's Hitchhiker in the case where he has 100% accuracy.

[-]Vaniver6y50

In particular, the old name for this, 'updatelessness', threw me for a loop for a while because it sounded like the dumb "don't take input from your environment" instead of the conscious "consider what impact you're having on hypothetical versions of yourself".

As a further example, consider glomarization. If I haven't committed a crime, pleading the fifth is worse than pleading innocence; however it means that when I have committed a crime, I have to either pay the costs of pleading guilty, pay the costs of lying, or plead the fifth (which will code to "I'm guilty", because I never say it when I'm innocent). If I care about honesty and being difficult to distinguish from the versions of myself who commit crimes, then I want to glomarize even before I commit any crimes.

[-]Vaniver6y30

If the problem statement is as described and the FDT agent sees "you'll take the right box" and the FDT agent takes the left box, then it must be the case that this was the unlucky bad prediction and made unlikely accordingly.

See also Nate Soares in Decisions are for making bad outcomes inconsistent. This is sort of a generalization, where 'decisions are for making bad outcomes unlikely.'

[-]abramdemski6y*140

Here are some (very lightly edited) comments I left on Will's draft of this post. (See also my top-level response.)

Responses to Sections II and III:

I’m not claiming that it’s clear what this means. E.g. see here, second bullet point, arguing there can be no such probability function, because any probability function requires certainty in logical facts and all their entailments.

This point shows the intertwining of logical counterfactuals (counterpossibles) and logical uncertainty. I take logical induction to represent significant progress generalizing probability theory to the case of logical uncertainty, ie, objects which have many of the virtues of probability functions while not requiring certainty about entailment of known facts. So, we can substantially reply to this objection.

However, replying to this objection does not necessarily mean we can define logical counterfactuals as we would want. So far we have only been able to use logical induction to specify a kind of "logically uncertain evidential conditional". (IE, something closer in spirit to EDT, which does behave more like FDT in some problems but not in general.)

I want to emphasize that I agree that specifying what logical counterfactuals are is a grave difficulty, so grave as to seem (to me, at present) to be damning, provided one can avoid the difficulty in some other approach. However, I don't actually think that the difficulty can be avoided in any other approach! I think CDT ultimately has to grapple with the question as well, because physics is math, and so physical counterfactuals are ultimately mathematical counterfactuals. Even EDT has to grapple with this problem, ultimately, due to the need to handle cases where one's own action can be logically known. (Or provide a convincing argument that such cases cannot arise, even for an agent which is computable.)

Guaranteed Payoffs: In conditions of certainty — that is, when the decision-maker has no uncertainty about what state of nature she is in, and no uncertainty about the utility payoff of each action is — the decision-maker should choose the action that maximises utility.

(Obligatory remark that what maximizes utility is part of what's at issue here, and for precisely this reason, an FDTist could respond that it's CDT and EDT which fail in the Bomb example -- by failing to maximize the a priori expected utility of the action taken.)

FDT would disagree with this principle in general, since full certainty implies certainty about one's action, and the utility to be received, as well as everything else. However, I think we can set that aside and say there's a version of FDT which would agree with this principle in terms of prior uncertainty. It seems cases like Bomb cannot be set up without either invoking prior uncertainty (taking the form of the predictor's failure rate) or bringing the question of how to deal with logically impossible decisions to the forefront (if we consider the case of a perfect predictor).

Why should prior uncertainty be important, in cases of posterior certainty? Because of the prior-optimality notion (in which a decision theory is judged on a decision problem based on the utility received in expectation according to the prior probability which defines the decision problem).

Prior-optimality considers the guaranteed-payoff objection to be very similar to objecting to a gambling strategy by pointing out that the gambling strategy sometimes loses. In Bomb, the problem clearly stipulates that an agent who follows the FDT recommendation has a trillion trillion to one odds of doing better than an agent who follows the CDT/EDT recommendation. Complaining about the one-in-a-trillion-trillion chance that you get the bomb while being the sort of agent who takes the bomb is, to an FDT-theorist, like a gambler who has just lost a trillion-trillion-to-one bet complaining that the bet doesn't look so rational now that the outcome is known with certainty to be the one-in-a-trillion-trillion case where the bet didn't pay well.

The right action, according to FDT, is to take Left, in the full knowledge that as a result you will slowly burn to death. Why? Because, using Y&S’s counterfactuals, if your algorithm were to output ‘Left’, then it would also have outputted ‘Left’ when the predictor made the simulation of you, and there would be no bomb in the box, and you could save yourself $100 by taking Left.

And why, on your account, is this implausible? To my eye, this is right there in the decision problem, not a weird counterintuitive consequence of FDT: the decision problem stipulates that algorithms which output 'left' will not end up in the situation of taking a bomb, with very, very high probability.

Again, complaining that you now know with certainty that you're in the unlucky position of seeing the bomb seems irrelevant in the way that a gambler complaining that they now know how the dice fell seems irrelevant -- it's still best to gamble according to the odds, taking the option which gives the best chance of success.

(But what I most want to convey here is that there is a coherent sense in which FDT does the optimal thing, whether or not one agrees with it.)

One way of thinking about this is to say that the FDT notion of "decision problem" is different from the CDT or EDT notion, in that FDT considers the prior to be of primary importance, whereas CDT and EDT consider it to be of no importance. If you had instead specified 'bomb' with just the certain information that 'left' is (causally and evidentially) very bad and 'right' is much less bad, then CDT and EDT would regard it as precisely the same decision problem, whereas FDT would consider it to be a radically different decision problem.

Another way to think about this is to say that FDT "rejects" decision problems which are improbable according to their own specification. In cases like Bomb where the situation as described is by its own description a one in a trillion trillion chance of occurring, FDT gives the outcome only one-trillion-trillion-th consideration in the expected utility calculation, when deciding on a strategy.

Also, I note that this analysis (on the part of FDT) does not hinge in this case on exotic counterfactuals. If you set Bomb up in the Savage framework, you would be forced to either give only the certain choice between bomb and not-bomb (so you don't represent the interesting part of the problem, involving the predictor) or to give the decision in terms of the prior, in which case the Savage framework would endorse the FDT recommendation.

Another framework in which we could arrive at the same analysis would be that of single-player extensive-form games, in which the FDT recommendation corresponds to the simple notion of optimal strategy, whereas the CDT recommendation amounts to the stipulation of subgame-optimality.

[-]abramdemski6y*90

Responses to Sections V and VI:

Implausible discontinuities

I'm puzzled by this concern. Is the doctrine of expected utility plagued by a corresponding 'implausible discontinuity' problem because if action 1 has expected value .999 and action 2 has expected value 1, then you should take action 2, but a very small change could mean you should take action 1? Is CDT plagued by an implausible-discontinuity problem because two problems which EDT would treat as the same will differ in causal expected value, and there must be some in-between problem where uncertainty about the causal structure balances between the two options, so CDT's recommendation implausibly makes a sharp shift when the uncertainty is jiggled a little? Can't we similarly boggle at the implausibility that a tiny change in the physical structure of a problem should make such a large difference in the causal structure so as to change CDT's recommendation? (For example, the tiny change can be a small adjustment to the coin which determines which of two causal structures will be in play, with no overall change in the evidential structure.)

It seems like what you find implausible about FDT here has nothing to do with discontinuity, unless you find CDT and EDT similarly implausible.

FDT is deeply indeterminate

This is obviously a big challenge for FDT; we don't know what logical counterfactuals look like, and invoking them is problematic until we do.

However, I can point to some toy models of FDT which lend credence to the idea that there's something there. The most interesting may be MUDT (see the "modal UDT" section of this summary post). This decision theory uses the notion of "possible" from the modal logic of provability, so that despite being a deterministic agent and therefore only taking one particular action in fact, agents have a well-defined possible-world structure to consider in making decisions, derived from what they can prove.

I have a post planned that focuses on a different toy model, single-player extensive-form games. This has the advantage of being only as exotic as standard game theory.

In both of these cases, FDT can be well-specified (at least, to the extent we're satisfied with calling the toy DTs examples of FDT -- which is a bit awkward, since FDT is kind of a weird umbrella term for several possible DTs, but also kind of specifically supposed to use functional graphs, which MUDT doesn't use).

It bears mentioning that a Bayesian already regards the probability distribution representing a problem to be deeply indeterminate, so this seems less bad if you start from such a perspective. Logical counterfactuals can similarly be thought of as subjective objects, rather than some objective fact which we have to uncover in order to know what FDT does.

On the other hand, greater indeterminacy is still worse; just because we already have lots of degrees of freedom to mess ourselves up with doesn't mean we happily accept even more.

And in general, it seems to me, there’s no fact of the matter about which algorithm a physical process is implementing in the absence of a particular interpretation of the inputs and outputs of that physical process.

Part of the reason that I'm happy for FDT to need such a fact is that I think I need such a fact anyway, in order to deal with anthropic uncertainty, and other issues.

If you don't think there's such a fact, then you can't take a computationalist perspective on theory of mind -- in which case, I wonder what position you take on questions such as consciousness. Obviously this leads to a number of questions which are quite aside from the point at hand, but I would personally think that questions such as whether an organism is experiencing suffering have to do with what computations are occurring. This ultimately cashes out to physical facts, yes, but it seems as if suffering should be a fundamentally computational fact which cashes out in terms of physical facts only in a substrate-independent way (ie, the physical facts of importance are precisely those which pertain to the question of which computation is running).

But almost all accounts of computation in physical processes have the issue that very many physical processes are running very many different algorithms, all at the same time.

Indeed, I think this is one of the main obstacles to a satisfying account -- a successful account should not have this property.

[-]abramdemski6y*70

Response to Section VII:

Assessing by how well the decision-maker does in possible worlds that she isn’t in fact in doesn’t seem a compelling criterion (and EDT and CDT could both do well by that criterion, too, depending on which possible worlds one is allowed to pick).

You make the claim that EDT and CDT can claim optimality in exactly the same way that FDT can, here, but I think the arguments are importantly not symmetric. CDT and EDT are optimal according to their own optimality notions, but given the choice to implement different decision procedures on later problems, both the CDT and EDT optimality notions would endorse selecting FDT over themselves in many of the problems mentioned in the paper, whereas FDT will endorse itself.

Most of this section seems to me to be an argument to make careful level distinctions, in an attempt to avoid the level-crossing argument which is FDT's main appeal. Certainly, FDTers such as myself will often use language which confuses the various levels, since we take a position which says they should be confusable -- the best decision procedures should follow the best policies, which should take the best actions. But making careful level distinctions does not block the level-crossing argument, it only clarifies it. FDT may not be the only "consistent fixed-point of normativity" (to the extent that it even is that), but CDT and EDT are clearly not that.

Fourth, arguing that FDT does best in a class of ‘fair’ problems, without being able to define what that class is or why it’s interesting, is a pretty weak argument.

I basically agree that the FDT paper dropped the ball here, in that it could have given a toy setting in which 'fair' is rigorously defined (in a pretty standard game-theoretic setting) and FDT has the claimed optimality notion. I hope my longer writeup can make such a setting clear.

Briefly: my interpretation of the "FDT does better" claim in the FDT paper is that FDT is supposed to take UDT-optimal actions. To the extent that it doesn't take UDT-optimal actions, I mostly don't endorse the claim that it does better (though I plan to note in a follow-up post an alternate view in which the FDT notion of optimality may be better).

The toy setting I have in mind that makes “UDT-optimal” completely well-defined is actually fairly general. The idea is that if we can represent a decision problem as a (single-player) extensive-form game, UDT is just the idea of throwing out the requirement of subgame-optimality. In other words, we don't even need a notion of "fairness" in the setting of extensive-form games -- the setting isn't rich enough to represent any "unfair" problems. Yet it is a pretty rich setting.

This observation was already made here: https://www.lesswrong.com/posts/W4sDWwGZ4puRBXMEZ/single-player-extensive-form-games-as-a-model-of-udt. Note that there are some concerns in the comments. I think the concerns make sense, and I’m not quite sure how I want to address them, but I also don’t think they’re damning to the toy model.

The FDT paper may have left out this model out of a desire for greater generality, which I do think is an important goal -- from my perspective, it makes sense not to reduce things to the toy model in which everything works out nicely.

[-]abramdemski6y60

Response to Section IV:

FDT fails to get the answer Y&S want in most instances of the core example that’s supposed to motivate it

I am basically sympathetic to this concern: I think there's a clear intuition that FDT is 2-boxing more than we would like (and a clear formal picture, in toy formalisms which show FDT-ish DTs failing on Agent Simulates Predictor problems).

Of course, it all depends on how logical counterfactuals are supposed to work. From a design perspective, I'm happy to take challenges like this as extra requirements for the behavior of logical counterfactuals, rather than objections to the whole project. I intuitively think there is a notion of logical counterfactual which fails in this respect, but, this does not mean there isn't some other notion which succeeds. Perhaps we can solve the easy problem of one-boxing with a strong predictor first, and then look for ways to one-box more generally (and in fact, this is what we've done -- one-boxing with a strong predictor is not so difficult).

However, I do want to add that when Omega uses very weak prediction methods such as the examples given, it is not so clear that we want to one-box. Will is presuming that Y&S simply want to one-box in any Newcomb problem. However, we could make a distinction between evidential Newcomb problems and functional Newcomb problems. Y&S already state that they consider some things to be functional Newcomb problems despite them not being evidential Newcomb problems (such as transparent Newcomb). It stands to reason that there would be some evidential Newcomb problems which are not functional Newcomb problems, as well, and that Y&S would prefer not to one-box in such cases.

However, the predictor needn’t be running your algorithm, or have anything like a representation of that algorithm, in order to predict whether you’ll one box or two-box. Perhaps the Scots tend to one-box, whereas the English tend to two-box.

In this example, it seems quite plausible that there's a (logico-causal) reason for the regularity, so that in the logical counterfactual where you act differently, your reference class also acts somewhat differently. Say you're Scottish, and 10% of Scots read a particular fairy tale growing up, and this is connected with why you two-box. Then in the counterfactual in which you one-box, it is quite possible that those 10% also one-box. Of course, this greatly weakens the connection between Omega's prediction and your action; perhaps the change of 10% is not enough to tip the scales in Omega's prediction.

But, without any account of Y&S’s notion of subjunctive counterfactuals, we just have no way of assessing whether that’s true or not. Y&S note that specifying an account of their notion of counterfactuals is an ‘open problem,’ but the problem is much deeper than that. Without such an account, it becomes completely indeterminate what follows from FDT, even in the core examples that are supposed to motivate it — and that makes FDT not a new decision theory so much as a promissory note.

In the TDT document, Eliezer addresses this concern by pointing out that CDT also takes a description of the causal structure of a problem as given, begging the question of how we learn causal counterfactuals. In this regard, FDT and CDT are on the same level of promissory-note-ness.

It might, of course, be taken as much more plausible that a technique of learning the physical-causal structure can be provided, in contrast to a technique which learns the logical-counterfactual structure.

I want to inject a little doubt about which is easier. If a robot is interacting with an exact simulation of itself (in an iterated prisoner's dilemma, say), won't it be easier to infer that it directly controls the copy than it is to figure out that the two are running on different computers and thus causally independent?

Put more generally: logical uncertainty has to be handled one way or another; it cannot be entirely put aside. Existing methods of testing causality are not designed to deal with it. It stands to reason that such methods applied naively to cases including logical uncertainty would treat such uncertainty like physical uncertainty, and therefore tend to produce logical-counterfactual structure. This would not necessarily be very good for FDT purposes, being the result of unprincipled accident -- and the concern for FDT's counterfactuals is that there may be no principled foundation. Still, I tend to think that other decision theories merely brush the problem under the rug, and actually have to deal with logical counterfactuals one way or another.

Indeed, on the most plausible ways of cashing this out, it doesn’t give the conclusions that Y&S would want. If I imagine the closest world in which 6288 + 1048 = 7336 is false (Y&S’s example), I imagine a world with laws of nature radically unlike ours — because the laws of nature rely, fundamentally, on the truths of mathematics, and if one mathematical truth is false then either (i) mathematics as a whole must be radically different, or (ii) all mathematical propositions are true because it is simple to prove a contradiction and every propositions follows from a contradiction.

To this I can only say again that FDT's problem of defining counterfactuals seems not so different to me from CDT's problem. A causal decision theorist should be able to work in a mathematical universe; indeed, this seems rather consistent with the ontology of modern science, though not forced by it. I find it implausible that a CDT advocate should have to deny Tegmark's mathematical universe hypothesis, or should break down and be unable to make decisions under the supposition. So, physical counterfactuals seem like they have to be at least capable of being logical counterfactuals (perhaps a different sort of logical counterfactual than FDT would use, since physical counterfactuals are supposed to give certain different answers, but a sort of logical counterfactual nonetheless).

(But this conclusion is far from obvious, and I don't expect ready agreement that CDT has to deal with this.)

[-]abramdemski6y*60

Response to Section VIII:

An alternative approaches that captures the spirit of FDT’s aims

I'm somewhat confused about how you can buy FDT as far as you seem to buy it in this section, while also claiming not to understand FDT to the point of saying there is no sensible perspective at all in which it can be said to achieve higher utility. From the perspective in this section, it seems you can straightforwardly interpret FDT's notion of expected utility maximization via an evaluative focal point such as "the output of the algorithm given these inputs".

This evaluative focal point addresses the concern you raise about how bounded ability to implement decision procedures interacts with a "best decision procedure" evaluative focal point (making it depart from FDT's recommendations in so far as the agent can't manage to act like FDT), since those concerns don't arise (at least not so clearly) when we consider what FDT would recommend for the response to one situation in particular. On the other hand, we also can make sense of the notion that taking the bomb is best, since (according to both global-CDT and global-EDT) it is best for an algorithm to output "left" when given the inputs of the bomb problem (in that it gives us the best news about how that agent would do in bomb problems, and causes the agent to do well when put in bomb problems, in so far as a causal intervention on the output of the algorithm also affects a predictor running the same algorithm).

[-]jessicata6y40

I think CDT ultimately has to grapple with the question as well, because physics is math, and so physical counterfactuals are ultimately mathematical counterfactuals.

"Physics is math" is ontologically reductive.

Physics can often be specified as a dynamical system (along with interpretations of e.g. what high-level entities it represents, how it gets observed). Dynamical systems can be specified mathematically. Dynamical systems also have causal counterfactuals (what if you suddenly changed the system state to be this instead?).

Causal counterfactuals defined this way have problems (violation of physical law has consequences). But they are well-defined.

[-]abramdemski6y30

Yeah, agreed, I no longer endorse the argument I was making there - one has to say more than "physics is math" to establish the importance of dealing with logical counterfactuals.

[-]Stuart_Armstrong6y120

I have to say, I find these criticisms a bit weak. Going through them:

III. FDT sometimes makes bizarre recommendations

I'd note that successfully navigating Parfit's hitchhiker also involve violating "Guaranteed Payoffs": you pay the driver at a time when there is no uncertainty, and where you get better utility from not doing so. So I don't think Guaranteed Payoffs is that sound a principle.

Your bomb example is a bit underdefined, since the predictor is predicting your actions AND giving you the prediction. If the predictor is simulating you and asking "would you go left after reading a prediction that you are going right", then you should go left; because, by the probabilities in the setup, you are almost certainly a simulation (this is kind of a "counterfactual Parfit hitchhiker" situation).

If the predictor doesn't simulate you, and you KNOW they said to go right, you are in a slightly different situation, and you should go right. This is akin to waking up in the middle of the Parfit hitchhiker experiment, when the driver has already decided to save you, and deciding whether to pay them.

IV. FDT fails to get the answer Y&S want in most instances of the core example that’s supposed to motivate it

This section is incorrect, I think. In this variant, the contents of the boxes are determined not by your decision algorithm, but by your nationality. And of course two-boxing is the right decision in that situation!

the case for one-boxing in Newcomb’s problem didn’t seem to stem from whether the Predictor was running a simulation of me, or just using some other way to predict what I’d do.

But it does depend on things like this. There's no point in one-boxing unless your one-boxing is connected with the predictor believing that you'd one-box. In a simulation, that's the case; in some other situations where the predictor looks at your algorithm, that's also the case. But if the predictor is predicting based on nationality, then you can freely two-box without changing the predictor's prediction.

V. Implausible discontinuities

There's nothing implausible about discontinuity in the optimal policy, even if the underlying data is continuous. If $p$ is the probability that we're in a smoking lesion vs a Newcomb problem, then as $p$ changes from $0$ to $1$ , the expected utility of one-boxing falls and the expected utility of two-boxing rises. At some point, the optimal action will jump discontinuously from one to the other.

VI. FDT is deeply indeterminate

I agree FDT is indeterminate, but I don't agree with your example. Your two calculators are clearly isomorphic, just as if we used a different numbering system for one versus the other. Talking about isomorphic algorithms avoids worrying about whether they're the "same" algorithm.

And in general, it seems to me, there’s no fact of the matter about which algorithm a physical process is implementing in the absence of a particular interpretation of the inputs and outputs of that physical process.

Indeed. But since you and your simulation are isomorphic, you can look at what the consequences are of you outputting "two-box" while your simulation outputs "deux boites" (or "one-box" and "une boite"). And {one-box, une boite} is better than {two-box, deux boites}.

But why did I use those particular interpretations of me and my simulation's physical processes? Because those interpretations are the ones relevant to the problem at hand. Me and my simulation will have a different weight, consume different amounts of power, are run at different times, and probably at different speeds. If those were relevant to the Newcomb problem, then the fact we are different becomes relevant. But since they aren't, we can focus in on the core of the matter. (you can also consider the example of playing the prisoner's dilemma against an almost-but-not-quite-identical copy of yourself).

[-]AlexMennen6y80

I object to the framing of the bomb scenario on the grounds that low probabilities of high stakes are a source of cognitive bias that trip people up for reasons having nothing to do with FDT. Consider the following decision problem: "There is a button. If you press the button, you will be given $100. Also, pressing the button has a very small (one in a trillion trillion) chance of causing you to burn to death." Most people would not touch that button. Using the same payoffs and probabilies in a scenario to challenge FDT thus exploits cognitive bias to make FDT look bad. A better scenario would be to replace the bomb with something that will fine you $1000 (and, if you want, also increase the chance of of error).

But then, it seems to me, that FDT has lost much of its initial motivation: the case for one-boxing in Newcomb’s problem didn’t seem to stem from whether the Predictor was running a simulation of me, or just using some other way to predict what I’d do.

I think the crucial difference here is how easily you can cause the predictor to be wrong. In the case where the predictor simulates you, if you two-box, then the predictor expects you to two-box. In the case where the predictor uses your nationality to predict your behavior, Scots usually one-box, and you're Scottish, if you two-box, then the predictor will still expect you to one-box because you're Scottish.

But now suppose that the pathway by which S causes there to be money in the opaque box or not is that another agent looks at S...

I didn't think that was supposed to matter at all? I haven't actually read the FDT paper, and have mostly just been operating under the assumption that FDT is basically the same as UDT, but UDT didn't build in any dependency on external agents, and I hadn't heard about any such dependency being introduced in FDT; it would surprise me if it did.

[-]Rohin Shah6y50

Planned summary for the Alignment Newsletter:

_This summary is more editorialized than most._ This post critiques Functional Decision Theory (FDT). I'm not going to go into detail, but I think the arguments basically fall into two camps. First, there are situations in which there is no uncertainty about the consequences of actions, and yet FDT chooses actions that do not have the highest utility, because of their impact on counterfactual worlds which "could have happened" (but ultimately, the agent is just leaving utility on the table). Second, FDT relies on the ability to tell when someone is "running an algorithm that is similar to you", or is "logically correlated with you". But there's no such crisp concept, and this leads to all sorts of problems with FDT as a decision theory.

Planned opinion:

Like Buck from MIRI , I feel like I understand these objections and disagree with them. On the first argument, I agree with Abram that a decision should be evaluated based on how well the agent performs with respect to the probability distribution used to define the problem; FDT only performs badly if you evaluate on a decision problem produced by conditioning on a highly improbable event. On the second class or arguments, I certainly agree that there isn't (yet) a crisp concept for "logical similarity"; however, I would be shocked if the _intuitive concept_ of logical similarity was not relevant in the general way that FDT suggests. If your goal is to hardcode FDT into an AI agent, or your goal is to write down a decision theory that in principle (e.g. with infinite computation) defines the correct action, then it's certainly a problem that we have no crisp definition yet. However, FDT can still be useful for getting more clarity on how one ought to reason, without providing a full definition.

[-]Charlie Steiner6y40

There's an interesting relationship with mathematizing of decision problems here, which I think is reflective of normal philosophy practice.

For example, in the Smoking Lesion problem, and in similar cases where you consider an agent to have "urges" or "dispositions" et c., it's important to note that these are pre-mathematical descriptions of something we'd like our decision theory to consider, and that to try to directly apply them to a mathematical theory is to commit a sort of type error.

Specifically, a decision-making procedure that "has a disposition to smoke" is not FDT. It is some other decision theory that has the capability to operate in uncertainty about its own dispositions.

I think it's totally reasonable to say that we want to research decision theories that are capable of this, because this epistemic state of not being quite sure of your own mind is something humans have to deal with all the time. But one cannot start with a mathematically specified decision theory like proof-based UDT or causal-graph-based CDT and then ask "what it would do if it had the smoking lesion." It's a question that seems intuitively reasonable but, when made precise, is nonsense.

I think what this feels like to philosophers is giving the verbal concepts primacy over the math. (With positive associations to "concepts" and negative associations to "math" implied). But what it leads to in practice is people saying "but what about the tickle defense?" or "but what about different formulations of CDT" as if they were talking about different facets of unified concepts (the things that are supposed to have primacy), when these facets have totally distinct mathematizations.

At some point, if you know that a tree falling in the forest makes the air vibrate but doesn't lead to auditory experiences, it's time to stop worrying about whether it makes a sound.

So obviously I (and LW orthodoxy) are on the pro-math side, and I think most philosophers are on the pro-concepts side (I'd say "pro-essences," but that's a bit too on the nose). But, importantly, if we agree that this descriptive difference exists, then we can at least work to bridge it by being clear about whether were's using the math perspective or the concept perspective. Then we can keep different mathematizations strictly separate when using the math perspective, but work to amalgamate them when talking about concepts.

[-]Chris_Leong6y*10

I feel the bomb problem could be better defined. What is the predictor predicting? Is it always predicting what you'll do when you see the note saying it will predict right? What about if you don't see this note because it predicts you'll go left? Then there's the issue that if it makes a prediction by a) trying to predict whether you'll see such a note or not, then b) predicting what the agent does in this case, then it'd already have to predict the agent's choice in order to make the prediction in stage a). In other words, a depends on b and b depends on a; the situation is circular. (edited since my previous comment was incorrect)

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

33

A Critique of Functional Decision Theory

33