## AI ALIGNMENT FORUMAF

Caspar Oesterheld

# Posts

Sorted by New

In memoryless Cartesian environments, every UDT policy is a CDT+SIA policy

Sorry for taking an eternity to reply (again).

On the first point: Good point! I've now finally fixed the SSA probabilities so that they sum up to 1, which really they should, to really have a version of EDT.

>prevents coordination between agents making different observations.

Yeah, coordination between different observations is definitely not optimal in this case. But I don't see an EDT way of doing it well. After all, there are cases where given one observation, you prefer one policy and given another observation you favor another policy. So I think you need the ex ante perspective to get consistent preferences over entire policies.

>(Oh, I ignored the splitting up of probabilities of trajectories into SSA probabilities and then adding them back up again, which may have some intuitive appeal but ends up being just a null operation. Does anyone see a significance to that part?)

The only significance is to get a version of EDT, which we would traditionally assume to have self-locating beliefs. From a purely mathematical point of view, I think it's nonsense.

Predictors exist: CDT going bonkers... forever

>Caspar Oesterheld and Vince Conitzer are also doing something like this

That paper can be found at https://users.cs.duke.edu/~ocaspar/CDTMoneyPump.pdf . And yes, it is structurally essentially the same as the problem in the post.

Pavlov Generalizes

Not super important but maybe worth mentioning in the context of generalizing Pavlov: the strategy Pavlov for the iterated PD can be seen as an extremely shortsighted version of the law of effect, which basically says: repeat actions that have worked well in the past (in similar situations). Of course, the LoE can be applied in a wide range of settings. For example, in their reinforcement learning textbook, Sutton and Barto write that LoE underlies all of (model-free) RL.

CDT=EDT=UDT

> I tried to understand Caspar’s EDT+SSA but was unable to figure it out. Can someone show how to apply it to an example like the AMD to help illustrate it?

Sorry about that! I'll try to explain it some more. Let's take the original AMD. Here, the agent only faces a single type of choice -- whether to EXIT or CONTINUE. Hence, in place of a policy we can just condition on when computing our SSA probabilities. Now, when using EDT+SSA, we assign probabilities to being a specific instance in a specific possible history of the world. For example, we assign probabilities of the form , which denotes the probability that given I choose to CONTINUE with probability , history (a.k.a. CONTINUE, EXIT) is actual and that I am the instance intersection (i.e., the first intersection). Since we're using SSA, these probabilities are computed as follows:

That is, we first compute the probability that the history itself is actual (given ). Then we multiply it by the probability that within that history I am the instance at , which is just 1 divided by the number of instances of myself in that history, i.e. 2.

Now, the expected value according to EDT + SSA given can be computed by just summing over all possible situations, i.e. over all combinations of a history and a position within that history and multiplying the probability of that situation with the utility given that situation:

And that's exactly the ex ante expected value (or UDT-expected value, I suppose) of continuing with probability . Hence, EDT+SSA's recommendation in AMD is the ex ante optimal policy (or UDT's recommendation, I suppose). This realization is not original to myself (though I came up with it independently in collaboration with Johannes Treutlein) -- the following papers make the same point:

• Rachael Briggs (2010): Putting a value on Beauty. In Tamar Szabo Gendler and John Hawthorne, editors, Oxford Studies in Epistemology: Volume 3, pages 3–34. Oxford University Press, 2010. http://joelvelasco.net/teaching/3865/briggs10-puttingavalueonbeauty.pdf
• Wolfgang Schwarz (2015): Lost memories and useless coins: revisiting the absentminded driver. In: Synthese. https://www.umsu.de/papers/driver-2011.pdf

My comment generalizes these results a bit to include cases in which the agent faces multiple different decisions.

Dutch-Booking CDT
Caspar Oesterheld is working on similar ideas.

For anyone who's interested, Abram here refers to my work with Vincent Conitzer which we write about here.

Reflexive Oracles and superrationality: prisoner's dilemma

My paper "Robust program equilibrium" (published in Theory and Decision) discusses essentially NicerBot (under the name ϵGroundedFairBot) and mentions Jessica's comment in footnote 3. More generally, the paper takes strategies from iterated games and transfers them into programs for the corresponding program game. As one example, tit for tat in the iterated prisoner's dilemma gives rise to NicerBot in the "open-source prisoner's dilemma".

In memoryless Cartesian environments, every UDT policy is a CDT+SIA policy

Since Briggs [1] shows that EDT+SSA and CDT+SIA are both ex-ante-optimal policies in some class of cases, one might wonder whether the result of this post transfers to EDT+SSA. I.e., in memoryless POMDPs, is every (ex ante) optimal policy also consistent with EDT+SSA in a similar sense. I think it is, as I will try to show below.

Given some existing policy , EDT+SSA recommends that upon receiving observation we should choose an action from (For notational simplicity, I'll assume that policies are deterministic, but, of course, actions may encode probability distributions.) Here, if and otherwise. is the SSA probability of being in state of the environment trajectory given the observation and the fact that one uses the policy .

The SSA probability is zero if and otherwise. Here, is the number of times occurs in . Note that this is the minimal reference class version of SSA, also known as the double-halfer rule (because it assigns 1/2 probability to tails in the Sleeping Beauty problem and sticks with 1/2 if it's told that it's Monday). is the (regular, non-anthropic) probability of the sequence of states , given that is played and is observed at least once. If (as in the sum above) is observed at least once in , we can rewrite this as Importantly, note that is constant in , i.e., the probability that you observe at least once cannot (in the present setting) depend on what you would do when you observe .

Inserting this into the above, we get where the first sum on the right-hand side is over all histories that give rise to observation at some point. Dividing by the number of agents with observation in a history and setting the policy for all agents at the same time cancel each other out, such that this equals Obviously, any optimal policy chooses in agreement with this. But the same disclaimers apply; if there are multiple observations, then multiple policies might satisfy the right-hand side of this equation and not all of these are optimal.

[1] Rachael Briggs (2010): Putting a value on Beauty. In Tamar Szabo Gendler and John Hawthorne, editors, Oxford Studies in Epistemology: Volume 3, pages 3–34. Oxford University Press, 2010. http://joelvelasco.net/teaching/3865/briggs10-puttingavalueonbeauty.pdf

In memoryless Cartesian environments, every UDT policy is a CDT+SIA policy

Caveat: The version of EDT provided above only takes dependences between instances of EDT making the same observation into account. Other dependences are possible because different decision situations may be completely "isomorphic"/symmetric even if the observations are different. It turns out that the result is not valid once one takes such dependences into account, as shown by Conitzer [2]. I propose a possible solution in https://casparoesterheld.com/2017/10/22/a-behaviorist-approach-to-building-phenomenological-bridges/ . Roughly speaking, my solution is to identify with all objects in the world that are perfectly correlated with you. However, the underlying motivation is unrelated to Conitzer's example.

[2] Vincent Conitzer: A Dutch Book against Sleeping Beauties Who Are Evidential Decision Theorists. Synthese, Volume 192, Issue 9, pp. 2887-2899, October 2015. https://arxiv.org/pdf/1705.03560.pdf