Reference Post: Trivial Decision Theory Problem — AI Alignment Forum

x

Reference Post: Trivial Decision Theory Problem — AI Alignment Forum

A trivial decision problem is one where there is only a single option that the agent can take. In that case, the most natural answer to the answer to the question, "What action should we take?" would be "The only action that we can take!". We will call this the Triviality Perspective.

A particularly interesting example is Transparent Newcomb's Problem. If you accept the premise of a perfect predictor, then seeing $1 million in the transparent box implies that you were predicted to one-box which implies that you will one-box. So the Triviality Perspective claims that you should one-box, but also that this is an incredibly boring claim that doesn't provide much insight into decision theory.

We can see that in general, any decision theory problem with a perfectly defined universe and a perfectly defined agent will be trivial. Evidential decision theory treats the fact of the matter about which counterfactual action we select the same as any other fact in the problem statement and hence arguably embraces the Triviality Perspective.

Alternatively, the Triviality Perspective can be seen as an overly restrictive and literal interpretation of what the problem is. We could interpret "What action should we take?" to be asking not about the set of actions that are consistent with the problem statement, but instead about a set of counterfactual worlds each corresponding to a different actions. We will call this the Counterfactual Perspective. From this perspective, the problem is only trivial before we have augmented it with counterfactuals.

Here are some examples: In Causal Decision Theory, we can just construct counterfactuals by changing the value of the node in the causal graph to whatever we want and remove any inbound links. In Functional Decision Theory, we imagine that a particular program outputs a value that it does not and then update other program that subjunctively depend on that program's value. The Erasure Approach reinterprets the problem removing an assumption so that there will then be multiple possible counterfactuals consistent with the problem statement.

Combining perspectives

It is actually possible to be sympathetic to both the Triviality Perspective and the Counterfactual Perspective. Instead of being seen as opposed perspectives, they can be seen as two different lens for viewing the same situation so long as we don't try to mix both at the same time. We will call this the Dual Perspective.

One area where combining both perspectives could be useful is when considering fatalistic arguments. Suppose there is a student who has a crucial exam in a week. They have the option to study or to go to the beach. Now the student reasons that it was determined at the start of time whether or not they were going to pass the exam and nothing they can do can change that. Therefore they decide to go to the beach. What is wrong with this reasoning?

One resolution would be to say that when we limit ourselves to considering the factual, the Triviality Perspective applies and student can only pick one option and therefore can only obtain one outcome. On the other hand, when we allow ourselves to augment the situation with counterfactuals, we might say the Counterfactual Perspective applies and there are both multiple outcomes and multiple possible choices. Here we are applying the first perspective when discussing what actually occurs in the world, and the second when analysing decisions (see The Prediction Problem for an application of this to Newcomb's problem).

(Sometimes it is useful to have a short post that contains a clear definition of a single concept for linking to, even if it doesn't contain any fundamentally new content. I'm still uncertain about the norms for the Alignment forum, so please let me know if you think this isn't the best place to post this)

This post was written with the support of the AI Safety Research Program