Smoking Lesion Steelman III: Revenge of the Tickle Defense

AI ALIGNMENT FORUM
AF

Smoking Lesion Steelman III: Revenge of the Tickle Defense — AI Alignment Forum

I improve the theory I put forward last time a bit, locate it in the literature, and discuss conditions when this approach unifies CDT and EDT.

Improvements

Where I left off last time, I proposed that decisions should obey the constraint

$a_{c h o s e n} = a r g m a x_{a} E_{P (. | A = a_{c h o s e n})} (U | d o (A = a))$

IE, evidentially conditioning on the chosen action, we should still find that if we causally evaluate different actions, we make the same choice. (My notation of this continues to be rather awkward, sorry.) This is supposed to combine the advantages of CDT and EDT, since we make maximal use of the information revealed by our choice of action, but avoid confusing evidence with influence in doing so.

Really, I should have instead said that the chosen action must be among the maximal choices, so that tie-breaking behavior doesn't matter.

This constraint may be satisfied by several actions; I referred to this as a challenge of "choosing an equilibrium" by analogy to game-theoretic equilibria. I neglected to consider that it may also be satisfied by no actions.

Choosing Between Multiple Options

I mentioned a concern that choosing between equilibria using expected value might sometimes select equilibria which hide information in order to look better. This doesn't actually happen in any of the examples I discussed, because being consistent with CDT-style choice prevents the information-hiding actions from being equilibria. However, we can easily make an example where it does happen. In the suicidal smoking lesion problem, we might additionally stipulate that the smoke-lovers intrinsically prefer to be non-smoke-lovers, placing -1000 utility on smoke-loving. Then, an equilibrium which leaves a significant chance of not being a smoke-lover looks much better in expected value. So, this method of equilibrium selection would choose the equilibrium where the agent doesn't smoke (and CDT doesn't know enough to switch to smoking, since it looks possible that the agent could smoke and not die).

This makes the solution seem clear to me: we require CDT at a higher level. The error is in choosing equilibria as if the modification of information states actually modified our attributes; but, alas, the self-hating smoke-lover is a smoke-lover whether they smoke or not.

Choosing equilibria via CDT just means selecting an action by CDT, out of those actions which could be equilibria. A question arises as to what information state we choose from. We can again make the argument that we need to choose "using all information" (including an evidential update on which equilibrium is chosen), bringing back the question of equilibrium selection in full force. But, this adds no information. Apparently there is no good answer to the equilibrium-selection problem, except that it is equivalent to the action-selection problem, which we solve by choosing from within some equilibrium.

In any case, this line of thinking made me realize that my "CEDT" is just the ratifiability condition. Ratifiability was first introduced by Jeffrey in The Logic of Decision. Skyrms showed in Ratifiability and the Logic of Decision that a reasonable interpretation of ratifiability (perhaps the only reasonable interpretation) makes Jeffrey's EDT select actions which are CDT-optimal when CDT chooses via a probability distribution which has been updated on knowledge of which action is to be selected.

We might imagine, as Skyrms imagined, that the selection of equilibrium proceeds via some kind of convergent process which starts at a knowledge state which is ignorant of the final choice, and ends up in a knowledge state which is in equilibrium. Skyrms raises interesting issues of the Dutch-bookability of such a process.

Choosing in Absence of Equilibria

On the other hand, there are some decision problems which lack ratifiable choices. One example is matching pennies, in the case that we expect the opponent has some skill in predicting us. If we update on taking either action, CDT then wishes to take the opposite one.

Now, I don't mean for my opinion on the EDT vs CDT debate to solve all problems correctly; I still think something like updatelessness is required. But, it does seem like we can naturally extend the spirit of ratifiability to get the right answer in these cases.

Rather than choosing pure actions, we choose mixed strategies, and then pick randomly with the equilibrium probabilities. This guarantees the existence of equilibria, as usual in game theory, since the best-response function is a Kakutani function of the probabilities. This gives the expected solution, choosing randomly with 50-50 odds, in matching pennies.

This might seem to violate the original idea -- aren't we supposed to evidentially condition on the actual output of our decision process, ie, the action we take? Aren't we failing to make use of all the information available to us, if we don't? In Regret and Instability in Causal Decision Theory, James Joyce argues in favor of mixed equilibria like this, partly on the basis of making a distinction between what CDT requires you to believe vs what CDT requires you to do. (He favors a view in which CDT was always intended to yield equilibria like this, and anything else just isn't CDT.)

Conditions for CDT=EDT

The main point in favor of mixed equilibria is that the stronger version requiring full knowledge of actions would be inconsistent, since ratifiable actions don't always exist. However, I think there may be something of deeper importance in Joyce's way of thinking. The "decision", as such, is the formation of the epistemic state about the action; it is this which is required to be in equilibrium. The action occurs after the decision, and may be anything which is consistent with the decision.

To see why this may be important, let's try to put the same self-knowledge condition on EDT. Suppose we are using a version of EDT which epsilon-explores. However, we impose the requirement that EDT's prior knows what policy EDT selects -- IE, it knows what it will do up to the randomness involved in exploration. EDT can still condition on alternate actions, thanks to the epsilon probability remaining for alternate actions. However, knowing its own policy in this way will remove most of the correlations which allow it to make EDT-like decisions. It won't cooperate with a copy of itself in Prisoner's Dilemma if the copy uses different random bits; it knows the policy which the copy will use, and all remaining wiggle room is in randomness, so it no longer sees its action as correlated with the other player's. It will two-box in a version of Newcomb's problem where Omega can't predict the random bits. And so on.

Roughly speaking, we've conditioned on the "output of my decision algorithm" node in the causal graph, which was the hidden parent making CDT different from EDT.

This shouldn't be too surprising, given the history of the ratifiability condition. Although I was invoking ratifiability to get CDT to do the right thing in cases where EDT seemed to be doing better, Jeffrey originally proposed ratifiability as a corrective measure for making EDT act like CDT in the cases where they disagreed.

However, this does not always make CDT=EDT. If your exploration bits can be predicted, then EDT and CDT still advise different actions. The "hard mode" of matching pennies is Death in Damascus: Death can predict your every move, so exploration or no, you meet your demise. Under the ratifiable equilibrium, CDT still proposes to flee half the time (as if frantically randomizing in the hope that Death will fail). EDT doesn't do this. In Newcomb with a perfect predictor, CDT two-boxes and EDT one-boxes. And so on. (Note that we still need to make EDT epsilon-explore in these cases, in order to keep the EDT conditional expected utilities well-defined.)

This leads me to propose the following rule for agents obeying ratifiability:

Law of Logical Causality: If conditioning on any event changes the probability an agent assigns to its own action, that event must be treated as causally downstream.

This is still a bit rough, but constraints on the structure of counterfactuals are hard to come by, so I'll take what I can get.

For example, this means that you must treat Death as causally downstream of your decision in Death in Damascus. You must treat Omega as causally downstream of you in the perfect-predictor version of Newcomb. And so on.

It does not mean that you must treat all things correlated with your decision as downstream. In Smoking Lesion Steelman, the lesion is still upstream of your decision. The Law of Logical Causality does not touch it, because an agent obeying ratifiability will know its own decision perfectly, screening off the action from such influences. But, this depends on my special version of Smoking Lesion. Perhaps in a different version of Smoking Lesion, where the increased tendency to smoke is more like a trembling hand after the decision is made, the influence is downstream rather than upstream.

Hence the title of this post. Ratifiability, the very tool which CDT advocates used to fend off a series of counterexamples (including cases like Murder Lesion and the suicidal smoking lesion), can be taken by EDT advocates and turned into a very strong version of the tickle defense. This strong version of the tickle defense makes EDT act as CDT advises in all but the most extreme cases, where predictors are perfect. From this, we propose a constraint on the causal graphs which would make CDT do as EDT would advise.

The whole structure is far from watertight, and is badly in need of formalization. However, it feels like progress to me.

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

2

Smoking Lesion Steelman III: Revenge of the Tickle Defense

2

Improvements

Choosing Between Multiple Options

Choosing in Absence of Equilibria

Conditions for CDT=EDT