This seems like a natural question, but I haven't been able to find this topic discussed online.

We are in a world with two continents on opposite sides. Each continent contains a single nation with nuclear weapons. Each side has one person with a button in front of them that will launch nuclear weapons at the other continent to wipe them out without wiping themselves out. Both countries are able to perfectly spy on the other and has access to a narrow AI which can predict human behaviors with near-perfect accuracy (and has not made a mistake yet after many trials).

For some reason, you were put in front of this button on one side. One day, you see that the other side has pressed the button and the weapons are on their way over. Pressing the button will wipe out humanity permanently.

  1. Is it fair to say that an adamant Causal Decision Theorist would not press the button and an adamant Evidential Decision Theorist would?
  2. What would you do? Is there more context needed to make this decision?
New Answer
New Comment

1 Answers sorted by

JBlack

Feb 26, 2024

33

No, it's not fair to say either of those. In this scenario both CDT and EDT are identical, because the outcomes of the possible actions are known with certainty. The decision only depends upon which of these outcomes the person in front of the button prefers, and that isn't stated. The AI is irrelevant to the decision for both.

There are other decision theories in which the decision may depend upon counterfactuals and the behaviour of the AI in addition to the known outcomes of the actions. Maybe you meant to ask about one of those?

Thanks for your answer; this explains why I was not able to find any related discussion on this. I read this article recently: https://www.lesswrong.com/posts/c3wWnvgzdbRhNnNbQ/timeless-decision-theory-problems-i-can-t-solve and misremembered it as a defense of evidential decision theory, instead of a different decision theory altogether.

So from a timeless decision theory perspective, is it correct to say that one would press the button? And from both EDT/CDT perspectives, one would not press the button (assuming they value the other country staying alive over no country staying alive)?

2JBlack2mo
For FDT and "timeless" decision theories and similar, it still depends upon the actual preferences of the agent, but is a much more messy problem. The agents on each side can be in one of at least three epistemic positions: 1. They have not observed a launch, and their AI does not predict one; 2. They have not observed a launch, but their AI predicts one; 3. They have observed a launch, after their AI predicted it. These three account for essentially all of the probability mass. In principle there is a fourth incredible outlier: 4. They have observed a launch, and their AI did not predict it. However, you are stated to be in scenario (4) so something of very low credence has occurred. You should notice you are confused, and consider alternative hypotheses! Maybe your hyper-reliable AI has failed, or your (hopefully reliable!) launch detection system has failed, or their AI has failed, or their launch systems triggered accidentally, or you are in a training exercise without having been told, or ... etc. Ordinary decision and game theory based on only the mainline scenarios will not help you here.
2CronoDAS2mo
The person asking the question has not told us if the AIs have predicted the launch or not.
2JBlack2mo
It's easily inferred from the fact that no earlier decision point was mentioned.
1notfnofn2mo
I didn't get around to providing more clarity. I'll do that now: 1. Both parties would click the button if it was clear that the other party would not click the button in retaliation. This way they do not have to worry about being wiped off the map. 2. The two parties would both prefer a world in which only the other party survives to a world without any humanity. We know that the other party will click the button if and only if they predict with extremely high confidence that we will not retaliate. Our position is the same.
2JBlack2mo
It is important how much they prefer each outcome. All these decision theories work with utilities, not just with preference orderings over outcomes. You could derive utilities from preference ordering over lotteries, e.g. would they prefer a 50:50 split between winning and extinction over the status quo? What about 10:90 or 90:10? There are also unknown probabilities such as chance of failure of each AI to predict the other side correctly, chance of false positive or negative launch detection, chance that someone is deliberately feeding false information to some participants, chance of a launch without having pressed the button or failure to launch after pressing the button, chance that this is just a simulation, and so on. Even if they're small probabilities, they are critical to this scenario.
1notfnofn2mo
For the second paragraph, we're assuming this AI has not made a mistake in predicting human behavior yet after many, many trials in different scenarios. No exact probability. We're also assuming perfect levels of observation, so we know that they pressed a button, bombs are heading over, and any observable context behind the decision (like false information). The first paragraph contains an idea I hadn't considered, and it might be central to the whole thing. I'll ponder it more.
2JBlack2mo
The main reason I mention it is that the scenario posits that the other side has already launched, and now it's time for you to make a decision. The trouble is that if the AI has not made a mistake, you should get an earlier chance to make a decision: when the AI tells you that it predicts that they are going to launch. Obviously your AI has failed, and maybe theirs did too. For EDT and CDT it makes no difference and you should not press the button, since (as per the additional information that all observations are perfect) the outcomes of your actions are certain. It does make a difference to theories such as FDT, but in a very unclear way. Under FDT the optimal action to take after the AI warns you that they are going to launch is to do nothing. When you subsequently confirm (with perfect reliability as per scenario) that they have launched, FDT says that you should launch. However, you're not in either of those positions. You didn't get a chance to make an earlier decision, so your AI must have failed. If you knew that this happened due to enemy action, FDT says that you should launch. If you are certain that it is not, then you should not. If you are uncertain then it depends upon what type of probability distribution you have over hypotheses, what your exact utilities are, and what other information you may be able to gather before you can no longer launch.