When wishful thinking works

AI ALIGNMENT FORUM
AF

When wishful thinking works — AI Alignment Forum

This idea is due to Scott Garrabrant.

Suppose you have propositions $φ_{1}, . . ., φ_{n}$ , and you want to form beliefs about whether they are true; specifically, you want to form a joint probability distribution $P$ over the events $φ_{1}, . . ., φ_{n}$ . But there’s a catch: these propositions might refer to the joint probability distribution you come up with. If $φ_{1}$ is the claim that $P (φ_{1}) < .5$ , then you have no way to assign probabilities in a well-calibrated way. But suppose these propositions depend continuously on the probabilities you assign to them. For instance, $φ_{1}$ could be defined so that its “true” probability is $1 - P (φ_{1})$ , where $P$ means probability that you assigned. Let $f$ be the function from the space of joint probability distributions over $φ_{1}, . . ., φ_{n}$ to itself that sends each probability distribution $μ$ to the true probability distribution that would result if you believed $μ$ . In this case, you can be well-calibrated by letting $P (φ_{1}) = .5$ . By Brouwer’s fixed point theorem, there will always be a way to assign probabilities in a well-calibrated way.

But $f$ could have multiple fixed points. Which one is right? You get to pick; whichever fixed point you decide to believe ends up being correct, since they are fixed points of the function determining the true probabilities from your beliefs. Cases in which there are multiple such fixed points are cases in which you actually can make something be true by believing it. So you may as well believe the fixed point according to which you have the highest expected utility.

As an example, suppose you’re suffering from an ailment that can be cured by placebo, and the placebo works even if you know it’s just a placebo, provided you believe that the placebo will work. When given a pill that you know is a placebo, you may as well believe that it will cure you, since then you’ll be right, and get better.

Related to the question of what to believe is the question of what actions to take. The traditional answer is to take the action which has the highest expected utility. Another possible answer is to act the way that you believe you will act. If we do this, then $f$ will have lots of fixed points: for any probability distribution over actions we could take, if we believe that we will take actions according to those probabilities, then we will be correct. And picking the fixed point that maximizes expected utility recovers the original rule of picking the action that maximizes expected utility.

A possible objection is to ask why we would restrict to fixed points, instead of just choosing to believe whatever probability $μ$ maximizes the expected utility of $f (μ)$ (which we might expect to often, though not necessarily always, be a fixed point, since having accurate beliefs is useful). A possible answer to this objection is that choosing to believe a non-fixed point because of what you expect the consequences of choosing this belief to be isn’t possible; since you are choosing based on $f (μ)$ , you are implicitly acting as if $f (μ)$ is your true beliefs, in which case the true probability distribution would be $f (f (μ))$ , and $f (μ)$ having high expected utility would not be useful.

If $f$ is not required to be continuous, then we can still almost find a fixed point by taking the closure of the graph of $f$ , and then taking the convex hull of each fiber. By Kakutani’s fixed point theorem, this multi-valued function has a fixed point. If the agent is only assumed to know its own utility function up to an error that is either infinitesimal (as in Definability of Truth in Probabilistic Logic) or small (as in Logical Induction), then adding a small random (unknown to the agent) error to a fixed point of the Kakutani closure of $f$ can give you a narrow probability distribution over probability distributions $μ$ that are almost fixed by $f$ . We can then take the highest expected utility of these pseudo-fixed points as in the continuous case.

This helps make sense of playing mixed strategy Nash equilibria in games. In (non-game theoretic) decision theory, it is often assumed that the outcome just depends on what action you actually take, and you take whichever action leads to highest expected utility. Under this framework, there is no reason you would want to randomize your action. But under the assumption that strategies are common knowledge, changes in your beliefs about your own actions will be reflected in other players’ beliefs about your actions, which influence their actions.

To a certain extent, this can also help make sense of how to pick good Nash equilibria instead of bad ones. In a game in which one player is playing best response, and the other player knows this, and is picking the best fixed point, the result will be the Nash equilibrium that is best for the latter player. If both players are playing best fixed point, then it’s unclear exactly what happens, since you’d need to know how to evaluate the counterfactuals in which one player changes their strategy. But you’d at least expect to end up in Pareto-optimal Nash equilibria.

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

12

12