# 20

The following is a critique of the idea of logical counterfactuals. The idea of logical counterfactuals has appeared in previous agent foundations research (especially at MIRI): here, here. "Impossible possible worlds" have been considered elsewhere in the literature; see the SEP article for a summary.

I will start by motivating the problem, which also gives an account for what a logical counterfactual is meant to be.

Suppose you learn about physics and find that you are a robot. You learn that your source code is "A". You also believe that you have free will; in particular, you may decide to take either action X or action Y. In fact, you take action X. Later, you simulate "A" and find, unsurprisingly, that when you give it the observations you saw up to deciding to take action X or Y, it outputs action X. However, you, at the time, had the sense that you could have taken action Y instead. You want to be consistent with your past self, so you want to, at this later time, believe that you could have taken action Y at the time. If you could have taken Y, then you do take Y in some possible world (which still satisfies the same laws of physics). In this possible world, it is the case that "A" returns Y upon being given those same observations. But, the output of "A" when given those observations is a fixed computation, so you now need to reason about a possible world that is logically incoherent, given your knowledge that "A" in fact returns X. This possible world is, then, a logical counterfactual: a "possible world" that is logically incoherent.

To summarize: a logical counterfactual is a notion of "what would have happened" had you taken a different action after seeing your source code, and in that "what would have happened", the source code must output a different action than what you actually took; hence, this "what would have happened" world is logically incoherent.

It is easy to see that this idea of logical counterfactuals is unsatisfactory. For one, no good account of them has yet been given. For two, there is a sense in which no account could be given; reasoning about logically incoherent worlds can only be so extensive before running into logical contradiction.

To extensively refute the idea, it is necessary to provide an alternative account of the motivating problem(s) which dispenses with the idea. Even if logical counterfactuals are unsatisfactory, the motivating problem(s) remain.

I now present two alternative accounts: counterfactual nonrealism, and policy-dependent source code.

## Counterfactual nonrealism

According to counterfactual nonrealism, there is no fact of the matter about what "would have happened" had a different action been taken. There is, simply, the sequence of actions you take, and the sequence of observations you get. At the time of taking an action, you are uncertain about what that action is; hence, from your perspective, there are multiple possibilities.

Given this uncertainty, you may consider material conditionals: if I take action X, will consequence Q necessarily follow? An action may be selected on the basis of these conditionals, such as by determining which action results in the highest guaranteed expected utility if that action is taken.

This is basically the approach taken in my post on subjective implication decision theory. It is also the approach taken by proof-based UDT.

The material conditionals are ephemeral, in that at a later time, the agent will know that they could only have taken a certain action (assuming they knew their source code before taking the action), due to having had longer to think by then; hence, all the original material conditionals will be vacuously true. The apparent nondeterminism is, then, only due to the epistemic limitation of the agent at the time of making the decision, a limitation not faced by a later version of the agent (or an outside agent) with more computation power.

This leads to a sort of relativism: what is undetermined from one perspective may be determined from another. This makes global accounting difficult: it's hard for one agent to evaluate whether another agent's action is any good, because the two agents have different epistemic states, resulting in different judgments on material conditionals.

A problem that comes up is that of "spurious counterfactuals" (analyzed in the linked paper on proof-based UDT). An agent may become sure of its own action before that action is taken. Upon being sure of that action, the agent will know the material implication that, if they take a different action, something terrible will happen (this material implication is vacuously true). Hence the agent may take the action they were sure they would take, making the original certainty self-fulfilling. (There are technical details with how the agent becomes certain having to do with Löb's theorem).

The most natural decision theory resulting in this framework is timeless decision theory (rather than updateless decision theory). This is because the agent updates on what they know about the world so far, and considers the material implications of themselves taken a certain action; these implications include logical implications if the agent knows their source code. Note that timeless decision theory is dynamically inconsistent in the counterfactual mugging problem.

## Policy-dependent source code

A second approach is to assert that one's source code depends on one's entire policy, rather than only one's actions up to seeing one's source code.

Formally, a policy is a function mapping an observation history to an action. It is distinct from source code, in that the source code specifies the implementation of the policy in some programming language, rather than itself being a policy function.

Logically, it is impossible for the same source code to generate two different policies. There is a fact of the matter about what action the source code outputs given an observation history (assuming the program halts). Hence there is no way for two different policies to be compatible with the same source code.

Let's return to the robot thought experiment and re-analyze it in light of this. After the robot has seen that their source code is "A" and taken action X, the robot considers what would have happened if they had taken action Y instead. However, if they had taken action Y instead, then their policy would, trivially, have to be different from their actual policy, which takes action X. Hence, their source code would be different. Hence, they would not have seen that their source code is "A".

Instead, if the agent were to take action Y upon seeing that their source code is "A", their source code must be something else, perhaps "B". Hence, which action the agent would have taken depends directly on their policy's behavior upon seeing that the source code is "B", and indirectly on the entire policy (as source code depends on policy).

We see, then, that the original thought experiment encodes a reasoning error. The later agent wants to ask what would have happened if they had taken a different action after knowing their source code; however, the agent neglects that such a policy change would have resulted in seeing different source code! Hence, there is no need to posit a logically incoherent possible world.

The reasoning error came about due to using a conventional, linear notion of interactive causality. Intuitively, what you see up to time t depends only on your actions before time t. However, policy-dependent source code breaks this condition. What source code you see that you have depends on your entire policy, not just what actions you took up to seeing your source code. Hence, reasoning under policy-dependent source code requires abandoning linear interactive causality.

The most natural decision theory resulting from this approach is updateless decision theory, rather that timeless decision theory, as it is the entire policy that the counterfactual is on.

## Conclusion

Before very recently, my philosophical approach had been counterfactual nonrealism. However, I am now more compelled by policy-dependent source code, after having analyzed it. I believe this approach fixes the main problem of counterfactual nonrealism, namely relativism making global accounting difficult. It also fixes the inherent dynamic inconsistency problems that TDT has relative to UDT (which are related to the relativism).

I believe the re-analysis I have provided of the thought experiment motivating logical counterfactuals is sufficient to refute the original interpretation, and thus to de-motivate logical counterfactuals.

The main problem with policy-dependent source code is that, since it violates linear interactive causality, analysis is correspondingly more difficult. Hence, there is further work to be done in considering simplified environment classes where possible simplifying assumptions (including linear interactive causality) can be made. It is critical, though, that the linear interactive causality assumption not be used in analyzing cases of an agent learning their source code, as this results in logical incoherence.

New Comment

I too have recently updated (somewhat) away from counterfactual non-realism. I have a lot of stuff I need to work out and write about it.

I seem to have a lot of disagreements with your post.

Given this uncertainty, you may consider material conditionals: if I take action X, will consequence Q necessarily follow? An action may be selected on the basis of these conditionals, such as by determining which action results in the highest guaranteed expected utility if that action is taken.

I don't think material conditionals are the best way to cash out counterfactual non-realism.

• The basic reason I think it's bad is the happy dance problem. This makes it seem clear that the sentence to condition on should not be .
• If the action can be viewed as a function of observations, conditioning on makes sense. But this is sort of like already having counterfactuals, or at least, being realist that there are counterfactuals about whan would do if the agent observed different things. So this response can be seen as abandoning counterfactual non-realism.
• A different approach is to consider conditional beliefs rather than material implications. I think this is more true to counterfactual non-realism. In the simplest form, this means you just condition on actions (rather than trying to condition on something like or ). However, in order to reason updatelessly, you need something like conditioning on conditionals, which complicates matters.
• Another reason to think it's bad is Troll Bridge.
• Again if the agent thinks there are basic counterfactual facts, (required to respect but little else -- ie entirely determined by subjective beliefs), then the agent can escape Troll Bridge by disagreeing with the relevant inference. But this, of course, rejects the kind of counterfactual non-realism you intend.
• To be more in line with counterfactual non-realism, we would like to use conditional probabilities instead. However, conditional probability behaves too much like material implication to block the Troll Bridge argument. However, I believe that there is an account of conditional probability which avoids this by rejecting the ratio analysis of conditional probability -- ie Bayes' definition -- and instead regards conditional probability as a basic entity. (Along the lines of what Alan Hájek goes on and on about.) Thus an EDT-like procedure can be immune to both 5-and-10 and Troll Bridge. (I claim.)

As for policy-dependent source code, I find myself quite unsympathetic to this view.

• If the agent is updateful, this is just saying that in counterfactuals where the agent does something else, it might have different source code. Which seems fine, but does it really solve anything? Why is this much better than counterfactuals which keep the source code fixed but imagine the execution trace being different? This seems to only push the rough spots further back -- there can still be contradictions, e.g. between the source code and the process by which programmers wrote the source code. Do you imagine it is possible to entirely remove such rough spots from the counterfactuals?
• So it seems you intend the agent to be updateless instead. But then we have all the usual issues with logical updatelessness. If the agent is logically updateless, there is absolutely no reason to think that its beliefs about the connections between source code and actual policy behavior is any good. Making those connections requires actual reasoning, not simply a good enough prior -- which means being logically updateful. So it's unclear what to do.
• Perhaps logically-updateful policy-dependent-source-code is the most reasonable version of the idea. But then we are faced with the usual questions about spurious counterfactuals, chicken rule, exploration, and Troll Bridge. So we still have to make choices about those things.

In the happy dance problem, when the agent is considering doing a happy dance, the agent should have already updated on M. This is more like timeless decision theory than updateless decision theory.

Conditioning on 'A(obs) = act' is still a conditional, not a counterfactual. The difference between conditionals and counterfactuals is the difference between "If Oswald didn't kill Kennedy, then someone else did" and "If Oswald didn't kill Kennedy, then someone else would have".

Indeed, troll bridge will present a problem for "playing chicken" approaches, which are probably necessary in counterfactual nonrealism.

For policy-dependent source code, I intend for the agent to be logically updateful, while updateless about observations.

Why is this much better than counterfactuals which keep the source code fixed but imagine the execution trace being different?

Because it doesn't lead to logical incoherence, so reasoning about counterfactuals doesn't have to be limited.

This seems to only push the rough spots further back—there can still be contradictions, e.g. between the source code and the process by which programmers wrote the source code.

If you see your source code is B instead of A, you should anticipate learning that the programmers programmed B instead of A, which means something was different in the process. So the counterfactual has implications backwards in physical time.

At some point it will ground out in: different indexical facts, different laws of physics, different initial conditions, different random events...

This theory isn't worked out yet but it doesn't yet seem that it will run into logical incoherence, the way logical counterfactuals do.

But then we are faced with the usual questions about spurious counterfactuals, chicken rule, exploration, and Troll Bridge.

Maybe some of these.

Spurious counterfactuals require getting a proof of "I will take action X". The proof proceeds by showing "source code A outputs action X". But an agent who accepts policy-dependent source code will believe they have source code other than A if they don't take action X. So the spurious proof doesn't prevent the counterfactual from being evaluated.

Chicken rule is hence unnecessary.

Exploration is a matter of whether the world model is any good; the world model may, for example, map a policy to a distribution of expected observations. (That is, the world model already has policy counterfactuals as part of it; theories such as physics provide constraints on the world model rather than fully determining it). Learning a good world model is of course a problem in any approach.

Whether troll bridge is a problem depends on how the source code counterfactual is evaluated. Indeed, many ways of running this counterfactual (e.g. inserting special cases into the source code) are "stupid" and could be punished in a troll bridge problem.

I by no means think "policy-dependent source code" is presently a well worked-out theory; the advantage relative to logical counterfactuals is that in the latter case, there is a strong theoretical obstacle to ever having a well worked-out theory, namely logical incoherence of the counterfactuals. Hence, coming up with a theory of policy-dependent source code seems more likely to succeed than coming up with a theory of logical counterfactuals.

If you see your source code is B instead of A, you should anticipate learning that the programmers programmed B instead of A, which means something was different in the process. So the counterfactual has implications backwards in physical time.

At some point it will ground out in: different indexical facts, different laws of physics, different initial conditions, different random events...

I'm not sure how you are thinking about this. It seems to me like this will imply really radical changes to the universe. Suppose the agent is choosing between a left path and a right path. Its actual programming will go left. It has to come up with alternate programming which would make it go right, in order to consider that scenario. The most probable universe in which its programming would make it go right is potentially really different from our own. In particular, it is a universe where it would go right despite everything it has observed, a lifetime of (updateless) learning, which in the real universe, has taught it that it should go left in situations like this.

EG, perhaps it has faced an iterated 5&10 problem, where left always yields 10. It has to consider alternate selves who, faced with that history, go right.

It just seems implausible that thinking about universes like that will result in systematically good decisions. In the iterated 5&10 example, perhaps universes where its programming fails iterated 5&10 are universes where iterated 5&10 is an exceedingly unlikely situation; so in fact, the reward for going right is quite unlikely to be 5, and very likely to be 100. Then the AI would choose to go right.

Obviously, this is not necessarily how you are thinking about it at all -- as you said, you haven't given an actual decision procedure. But the idea of considering only really consistent counterfactual worlds seems quite problematic.

I agree this is a problem, but isn't this a problem for logical counterfactual approaches as well? Isn't it also weird for a known fixed optimizer source code to produce a different result on this decision where it's obvious that 'left' is the best decision?

If you assume that the agent chose 'right', it's more reasonable to think it's because it's not a pure optimizer than that a pure optimizer would have chosen 'right', in my view.

If you form the intent to, as a policy, go 'right' on the 100th turn, you should anticipate learning that your source code is not the code of a pure optimizer.

I'm left with the feeling that you don't see the problem I'm pointing at.

My concern is that the most plausible world where you aren't a pure optimizer might look very very different, and whether this very very different world looks better or worse than the normal-looking world does not seem very relevant to the current decision.

Consider the "special exception selves" you mention -- the Nth exception-self has a hard-coded exception "go right if it's beet at least N turns and you've gone right at most 1/N of the time".

Now let's suppose that the worlds which give rise to exception-selves are a bit wild. That is to say, the rewards in those worlds have pretty high variance. So a significant fraction of them have quite high reward -- let's just say 10% of them have value much higher than is achievable in the real world.

So we expect that by around N=10, there will be an exception-self living in a world that looks really good.

This suggests to me that the policy-dependent-source agent cannot learn to go left > 90% of the time, because once it crosses that threshhold, the exception-self in the really good looking world is ready to trigger its exception -- so going right starts to appear really good. The agent goes right until it is under the threshhold again.

If that's true, then it seems to me rather bad: the agent ends up repeatedly going right in a situation where it should be able to learn to go left easily. Its reason for repeatedly going right? There is one enticing world, which looks much like the real world, except that in that world the agent definitely goes right. Because that agent is a lucky agent who gets a lot of utility, the actual agent has decided to copy its behavior exactly -- anything else would prove the real agent unlucky, which would be sad.

Of course, this outcome is far from obvious; I'm playing fast and loose with how this sort of agent might reason.

I think it's worth examining more closely what it means to be "not a pure optimizer". Formally, a VNM utility function is a rationalization of a coherent policy. Say that you have some idea about what your utility function is, U. Suppose you then decide to follow a policy that does not maximize U. Logically, it follows that U is not really your utility function; either your policy doesn't coherently maximize any utility function, or it maximizes some other utility function. (Because the utility function is, by definition, a rationalization of the policy)

Failing to disambiguate these two notions of "the agent's utility function" is a map-territory error.

Decision theories require, as input, a utility function to maximize, and output a policy. If a decision theory is adopted by an agent who is using it to determine their policy (rather than already knowing their policy), then they are operating on some preliminary idea about what their utility function is. Their "actual" utility function is dependent on their policy; it need not match up with their idea.

So, it is very much possible for an agent who is operating on an idea U of their utility function, to evaluate counterfactuals in which their true behavioral utility function is not U. Indeed, this is implied by the fact that utility functions are rationalizations of policies.

Let's look at the "turn left/right" example. The agent is operating on a utility function idea U, which is higher the more the agent turns left. When they evaluate the policy of turning "right" on the 10th time, they must conclude that, in this hypothetical, either (a) "right" maximizes U, (b) they are maximizing some utility function other than U, or (c) they aren't a maximizer at all.

The logical counterfactual framework says the answer is (a): that the fixed computation of U-maximization results in turning right, not left. But, this is actually the weirdest of the three worlds. It is hard to imagine ways that "right" maximizes U, whereas it is easy to imagine that the agent is maximizing a utility function other than U, or is not a maximizer.

Yes, the (b) and (c) worlds may be weird in a problematic way. However, it is hard to imagine these being nearly as weird as (a).

One way they could be weird is that an agent having a complex utility function is likely to have been produced by a different process than an agent with a simple utility function. So the more weird exceptional decisions you make, the greater the evidence is that you were produced by the sort of process that produces complex utility functions.

This is pretty similar to the smoking lesion problem, then. I expect that policy-dependent source code will have a lot in common with EDT, as they both consider "what sort of agent I am" to be a consequence of one's policy. (However, as you've pointed out, there are important complications with the framing of the smoking lesion problem)

I think further disambiguation on this could benefit from re-analyzing the smoking lesion problem (or a similar problem), but I'm not sure if I have the right set of concepts for this yet.

OK, all of that made sense to me. I find the direction more plausible than when I first read your post, although it still seems like it'll fall to the problem I sketched.

I both like and hate that it treats logical uncertainty in a radically different way from empirical uncertainty -- like, because we have so far failed to find any way to treat the two uniformly (besides being entirely updateful that is); and hate, because it still feels so wrong for the two to be very different.

Conditioning on ‘A(obs) = act’ is still a conditional, not a counterfactual. The difference between conditionals and counterfactuals is the difference between “If Oswald didn’t kill Kennedy, then someone else did” and “If Oswald didn’t kill Kennedy, then someone else would have”.

I still disagree. We need a counterfactual structure in order to consider the agent as a function A(obs). EG, if the agent is a computer program, the function would contain all the counterfactual information about what the agent would do if it observed different things. Hence, considering the agent's computer program as such a function leverages an ontological commitment to those counterfactuals.

To illustrate this, consider counterfactual mugging where we already see that the coin is heads -- so, there is nothing we can do, we are at the mercy of our counterfactual partner. But suppose we haven't yet observed whether Omega gives us the money.

A "real counterfactual" is one which can be true or false independently of whether its condition is met. In this case, if we believe in real counterfactuals, we believe that there is a fact of the matter about what we do in the case, even though the coin came up heads. If we don't believe in real counterfactuals, we instead think only that there is a fact of how Omega is computing "what I would have done if the coin had been tails" -- but we do not believe there is any "correct" way for Omega to compute that.

The representation and the representation both appear to satisfy this test of non-realism. The first is always true if the observation is false, so, lacks the ability to vary independently of the observation. The second is undefined when the observation is false, which is perhaps even more appealing for the non-realist.

Now consider the representation. can still vary even when we know . So, it fails this test -- it is a realist representation!

Putting something into functional form imputes a causal/counterfactual structure.

This indeed makes sense when "obs" is itself a logical fact. If obs is a sensory input, though, 'A(obs) = act' is a logical fact, not a logical counterfactual. (I'm not trying to avoid causal interpretations of source code interpreters here, just logical counterfactuals)

Ahhh ok.

In the happy dance problem, when the agent is considering doing a happy dance, the agent should have already updated on M. This is more like timeless decision theory than updateless decision theory.

I agree that this gets around the problem, but to me the happy dance problem is still suggestive -- it looks like the material conditional is the wrong representation of the thing we want to condition on.

Also -- if the agent has already updated on observations, then updating on is just the same as updating on . So this difference only matters in the updateless case, where it seems to cause us trouble.

I found parts of your framing quite original and I'm still trying to understand all the consequences.

Firstly, I'm also opposed to characterising the problem in terms of logical counterfactuals. I've argued before that Counterfactuals are an Answer Not a Question, although maybe it would have been clearer to say that they are a Tool Not a Question instead. If we're talking strictly, it doesn't make sense to ask what maths would. be like if 1+1=3 as it doesn't, but we can construct a para-consistent logic where it makes sense to do something analogous to pretending 1+1=3. And so maybe one form of "logical counterfactual" could be useful for solving these problems, but that doesn't mean asking what logical counterfactuals are, as though they were ontologically basic, as though they were in the map not the territory, as though they were a single unified concept, makes sense.

Secondly, "free will" is such a loaded word that using it in a non-standard fashion simply obscures and confuses the discussion. Nonetheless, I think you are touching upon an important point here. I have a framing which I believe helps clarify the situation. If there's only one possible decision, this gives us a Trivial Decision Problem. So to have a non-trivial decision problem, we'd need a model containing at least two decisions. If we actually did have libertarian free will, then our decision problems would always be non-trivial. However, in the absence of this, the only way to avoid triviality would be to augment the factual with at least one counterfactual.

Counterfactual non-realism: Hmm... I see how this could be a useful concept, but the definition given feels a bit vague. For example, recently I've been arguing in favour of what counts as a valid counterfactual being at least partially a matter of social convention. Is that counterfactual non-realism?

Further, it seems a bit strange to associate material conditions with counterfactual non-realism. Material conditions only provide the outcome when we have a consistent counterfactual. So, either a) we believe in libertarian free will b) we use something like the erasure approach to remove information such that we have multiple consistent possibilities (see https://www.lesswrong.com/posts/BRuWm4GxcTNPn4XDX/deconfusing-logical-counterfactuals). Proof-based UDT doesn't quite use material conditionals, it uses a paraconsistent version of them instead. Although, maybe I'm just being too pedantic here. In any case, we can find ways of making paraconsistent logic behave as expected in any scenario, however it would require a separate ground. That is, it isn't enough that the logic merely seems to work, but we should be able to provide a separate reason for why using a paraconsistent logic in that way is good.

Also, another approach which kind of aligns with counterfactual non-realism is to say that given the state of the universe at any particular time we can determine the past and future and that there are no counterfactuals beyond those we generate by imagining state Y at time T instead of state X. So, to imagine counterfactually taking action Y we replace the agent doing X with another agent doing Y and flow causation both forwards and backwards. (See this post for more detail). It could be argued that these count as counterfactuals, but I'd align it with counterfactual non-realism as it doesn't have decision counterfactuals as seperate ontological elements.

Policy-dependent source code - this is actually a pretty interesting framing. I've always defaulted to thinking about counterfactuals in terms of actions, but when we're talking about things in terms of problems like Counterfactual Mugging, characterising counterfactuals in terms of policy might be more natural. It's strange why this feels fresh to me - I mean UDT takes this approach - but I never considered the possibility of non-UDT policy counterfactuals. I guess from a philosophical perspective it makes sense to first consider whether policy-dependent source code makes sense and then if it does further ask whether UDT makes sense.

Secondly, “free will” is such a loaded word that using it in a non-standard fashion simply obscures and confuses the discussion.

Wikipedia says "Free will is the ability to choose between different possible courses of action unimpeded." SEP says "The term “free will” has emerged over the past two millennia as the canonical designator for a significant kind of control over one’s actions." So my usage seems pretty standard.

For example, recently I’ve been arguing in favour of what counts as a valid counterfactual being at least partially a matter of social convention.

All word definitions are determined in large part by social convention. The question is whether the social convention corresponds to a definition (e.g. with truth conditions) or not. If it does, then the social convention is realist, if not, it's nonrealist (perhaps emotivist, etc).

Material conditions only provide the outcome when we have a consistent counterfactual.

Not necessarily. An agent may be uncertain over its own action, and thus have uncertainty about material conditionals involving its action. The "possible worlds" represented by this uncertainty may be logically inconsistent, in ways the agent can't determine before making the decision.

Proof-based UDT doesn’t quite use material conditionals, it uses a paraconsistent version of them instead.

I don't understand this? I thought it searched for proofs of the form "if I take this action, then I get at least this much utility", which is a material conditional.

So, to imagine counterfactually taking action Y we replace the agent doing X with another agent doing Y and flow causation both forwards and backwards.

Policy-dependent source code does this; one's source code depends on one's policy.

I guess from a philosophical perspective it makes sense to first consider whether policy-dependent source code makes sense and then if it does further ask whether UDT makes sense.

I think UDT makes sense in "dualistic" decision problems that are already factorized as "this policy leads to these consequences". Extending it to a nondualist case brings up difficulties, including the free will / determinism issue. Policy-dependent source code is a way of interpreting UDT in a setting with deterministic, knowable physics.

So my usage (of free will) seems pretty standard.

Not quite. The way you are using it doesn't necessarily imply real control, it may be imaginary control.

All word definitions are determined in large part by social convention

True. Maybe I should clarify what I'm suggesting. My current theory is that there are multiple reasonable definitions of counterfactual and it comes down to social norms as to what we accept as a valid counterfactual. However, it is still very much a work in progress, so I wouldn't be able to provide more than vague details.

The "possible worlds" represented by this uncertainty may be logically inconsistent, in ways the agent can't determine before making the decision.

I guess my point was that this notion of counterfactual isn't strictly a material conditional due to the principle of explosion. It's a "para-consistent material conditional" by which I mean the algorithm is limited in such a way as to prevent this explosion.

Policy-dependent source code does this; one's source code depends on one's policy.

Hmm... good point. However, were you flowing this all the way back in time? Such as if you change someone's source code, you'd also have to change the person who programmed them.

I think UDT makes sense in "dualistic" decision problems'\

What do you mean by dualistic?

The way you are using it doesn’t necessarily imply real control, it may be imaginary control.

I'm discussing a hypothetical agent who believes itself to have control. So its beliefs include "I have free will". Its belief isn't "I believe that I have free will".

It’s a “para-consistent material conditional” by which I mean the algorithm is limited in such a way as to prevent this explosion.

Yes, that makes sense.

However, were you flowing this all the way back in time?

Yes (see thread with Abram Demski).

What do you mean by dualistic?

Already factorized as an agent interacting with an environment.

Yes (see thread with Abram Demski).

Hmm, yeah this could be a viable theory. Anyway to summarise the argument I make in Is Backwards Causation Necessarily Absurd?, I point out that since physics is pretty much reversible, instead of A causing B, it seems as though we could also imagine B causing A and time going backwards. In this view, it would be reasonable to say that one-boxing (backwards-)caused the box to be full in Newcombs. I only sketched the theory because I don't have enough physics knowledge to evaluate it. But the point is that we can give justification for a non-standard model of causality.

In this possible world, it is the case that "A" returns Y upon being given those same observations. But, the output of "A" when given those observations is a fixed computation, so you now need to reason about a possible world that is logically incoherent, given your knowledge that "A" in fact returns X. This possible world is, then, a logical counterfactual: a "possible world" that is logically incoherent.

Simpler solution: in that world, your code is instead A', which is exactly like A, except that it returns Y in this situation. This is the more general solution derived from Pearl's account of counterfactuals in domains with a finite number of variables (the "twin network construction").

Last year, my colleagues and I published a paper on Turing-complete counterfactual models ("causal probabilistic programming"), which details how to do this, and even gives executable code to play with, as well as a formal semantics. Have a look at our predator-prey example, a fully worked example of how to do this "counterfactual world is same except blah" construction.

http://www.zenna.org/publications/causal.pdf

Yes, this is a specific way of doing policy-dependent source code, which minimizes how much the source code has to change to handle the counterfactual.

Haven't looked deeply into the paper yet but the basic idea seems sound.

I see the problem of counterfactuals as essentially solved by quasi-Bayesianism, which behaves like UDT in all Newcomb-like situations. The source code in your presentation of the problem is more or less equivalent to Omega in Newcomb-like problems. A TRL agent can also reason about arbitrary programs, and learn that a certain program acts as a predictor for its own actions.

This approach has some similarity with material implication and proof-based decision theory, in the sense that out of several hypothesis about counterfactuals that are consistent with observations, the decisive role is played by the most optimistic hypothesis (the one that can be exploited for the most expected utility). However, it has no problem with global accounting and indeed it solves counterfactual mugging successfully.

It seems the approaches we're using are similar, in that they both are starting from observation/action history with posited falsifiable laws, with the agent's source code not known a priori, and the agent considering different policies.

Learning "my source code is A" is quite similar to learning "Omega predicts my action is equal to A()", so these would lead to similar results.

Policy-dependent source code, then, corresponds to Omega making different predictions depending on the agent's intended policy, such that when comparing policies, the agent has to imagine Omega predicting differently (as it would imagine learning different source code under policy-dependent source code).

Policy-dependent source code, then, corresponds to Omega making different predictions depending on the agent's intended policy, such that when comparing policies, the agent has to imagine Omega predicting differently (as it would imagine learning different source code under policy-dependent source code).

Well, in quasi-Bayesianism for each policy you have to consider the worst-case environment in your belief set, which depends on the policy. I guess that in this sense it is analogous.

What is "linear interactive causality"?

Basically, the assumption that you're participating in a POMDP. The idea is that there's some hidden state that your actions interact with in a temporally linear fashion (i.e. action 1 affects state 2), such that your late actions can't affect early states/observations.

OK, so no "backwards causation" ? (not sure if that's a technical term and/or if I'm using it right...)

Is there a word we could use instead of "linear", which to an ML person sounds like "as in linear algebra"?

Yes, it's about no backwards assumption. Linear has lots of meanings, I'm not concerned about this getting confused with linear algebra, but you can suggest a better term if you have one.