Mentioned in

XOR Blackmail & Causality

2Daniel Kokotajlo

2Abram Demski

1Charlie Steiner

1Abram Demski

New Comment

Maybe I'm late to the party, in which case sorry about that & I look forward to hearing why I'm wrong, but I'm not convinced that epsilon-exploration is a satisfactory way to ensure that conditional probabilities are well-defined. Here's why:

What ends up happening if I do action A often depends on why I did it. For example, if someone else is deciding how to treat me, and I defect against them, but it's because of epsilon-exploration rather than because that's what my reasoning process concluded, then they would likely be inclined to forgive me and cooperate with me in the future. So the conditional probability will be well-defined, but defined incorrectly--it will say that the probability of them cooperating with me in the future, conditional on me defecting now, is high.

I hear there is a way to fiddle with the foundations of probability theory so that conditional probabilities are taken as basic and ordinary probabilities are defined in terms of them. Maybe this would solve the problem?

I hear there is a way to fiddle with the foundations of probability theory so that conditional probabilities are taken as basic and ordinary probabilities are defined in terms of them. Maybe this would solve the problem?

This does help somewhat. See here. But, in order to get good answers from that, you need to already know enough about the structure of the situation.

Maybe I'm late to the party, in which case sorry about that & I look forward to hearing why I'm wrong, but I'm not convinced that epsilon-exploration is a satisfactory way to ensure that conditional probabilities are well-defined. Here's why:

I agree, but I also think there are some things pointing in the direction of "there's something interesting going on with epsilon exploration". Specifically, there's a pretty strong analogy between epsilon exploration and modal UDT: MUDT is like the limit as you send exploration probability to zero, so it never actually happens but it still happens in nonstandard models. However, that only seems to work when you know the structure of the situation logically. When you have to learn it, you have to actually explore sometimes to get it right.

To the extent that MUDT looks like a deep result about counterfactual reasoning, I take this as a point in favor of epsilon exploration telling us something about the deep structure of counterfactual reasoning.

Anyway, see here for some more recent thoughts of mine. (But I didn't discuss the question of epsilon exploration as much as I could have.)

Did you need an abstract copy of the action to represent Newcomb's problem? No. The causal network just has the predictor influenced by a previous state of the agent, plus the state of their house, and either sends the letter or not. This is all CDT uses.

[Cross-posted from IAFF.]I edited my

previous postto note that I’m now much less optimistic about the direction I was going in. This post is to further elaborate the issue and my current position.Counterfactual reasoning is something we don’t understand very well, and which has so many free parameters that it seems to explain just about any solution to a decision problem which one might want to get based on intuition. So, it would be nice to eliminate it from our ontology – to reduce the cases in which it truly captures something important to machinery which we understand, and write off the other cases as “counterfactual-of-the-gaps” in need of some other solution than counterfactuals.

My approach to this involved showing that, in many cases, EDT learns to act like CDT because its knowledge of its own typical behavior screens off the action from the correlations which are generally thought to make EDT cooperate in one-shot prisoner’s dilemma with similar agents, one-box in Newcomb’s problem, and so on. This is essentially a version of the tickle defense. I also pointed out that the same kind of self-knowledge constraint is needed to deal with some counterexamples to CDT; so, CDT can’t be justified as a way of dealing with cases of failure of self-knowledge in general. Instead, CDT seems to improve the situation in some cases of self-knowledge failure, while EDT does better in other such cases.

This suggests a view in which the self-knowledge constraint is a rationality constraint, so the tickle defense is thought of as being true for rational agents, and CDT=EDT under these conditions of rationality. I suggested that problems for which this was not true had to somehow violate the ability of the agent to perform experiments in the world; IE, the decision problem would have to be set up in such a way as to prevent the agent from decorrelating its actions from things in the environment which are not causally downstream of its actions. This seems in some sense unfair, as the environment is preventing the agent from correctly learning the causal relationships through experimentation. I called this condition the

law of logical causalitywhen it first occurred to me, andmixed-strategy implementabilityin the setup where I proved conditions for CDT=EDT.In XOR Blackmail with a perfect predictor, however, mixed-strategy implementability is violated in a way which does not intuitively seem unfair. As a result, knowledge of what sort of thing you do in XOR blackmail is not sufficient to decorrelate your actions from things which you have no control over. Constraining to the epsilon-exploration case, so that conditional probabilities are well-defined, it seems like what happens is that the epsilon-exploration bit correlates the action you take with the disaster (thanks to the XOR which determines if the letter is sent). On the other hand, it seems as if CDT should be able to get the right answer.

However, I’m unable to come up with a causal Bayes net which seems to faithfully represent the problem, so that I can properly compare how CDT and EDT reason about it in the same representation. It seems like the letter has to be both a parent and a child of the action. I thought I could represent things properly by having a copy of the action node, representing the simulation of the agent which the predictor uses to predict; but, I don’t see how to represent the perfect correlation between the copy and the real action without effectively severing the other parents of the real action.

Anyone have ideas about how to represent XOR Blackmail in a causal network?

Edit:

Here we go. I was confused by the fact that CDT can’t reason as if its action makes the letter not get sent. The following causal graph works well enough:

Variables:

: the action. True if money is sent to the blackmailer.A: a copy of the action, representing the abstract mathematical fact of what the agent does if it sees the letter.A′: Whether the letter is sent or not.L: The rare disaster.D: The utility.UCausal connections:

: HasAandA′as parents, with the following function: If the letter is sent, copyL. Otherwise, false.A′: No parents.A′:LandA′as parents, with the XOR function determiningD.L: No parents.D:UandAas parents, with the utility function as stated in the original XOR post.DAssume epsilon-exploration to ensure that the conditional probabilities are well-defined. Even if EDT knows its own policy, it sees itself as having control over the disaster. CDT, on the other hand, sees no such connection, so it refuses to send the money.