Counterfactual Induction

Oh, I see what the issue is. Propositional tautology given means , not . So yeah, when A is a boolean that is equivalent to via boolean logic alone, we can't use that A for the exact reason you said, but if A isn't equivalent to via boolean logic alone (although it may be possible to infer by other means), then the denominator isn't necessarily small.

Counterfactual Induction

Yup, a monoid, because and , so it acts as an identitity element, and we don't care about the order. Nice catch.

You're also correct about what propositional tautology given A means.

CDT Dutch Book

(lightly edited restatement of email comment)

Let's see what happens when we adapt this to the canonical instance of "no, really, counterfactuals aren't conditionals and should have different probabilities". The cosmic ray problem, where the agent has the choice between two paths, it slightly prefers taking the left path, but its conditional on taking the right path is a tiny slice of probability mass that's mostly composed of stuff like "I took the suboptimal action because I got hit by a cosmic ray".

There will be 0 utility for taking left path, -10 utility for taking the right path, and -1000 utility for a cosmic ray hit. The CDT counterfactual says 0 utility for taking left path, -10 utility for taking the right path, while the conditional says 0 utility for left path, -1010 utility for right path (because conditional on taking the right path, you were hit by a cosmic ray).

In order to get the dutch book to go through, we need to get the agent to take the right path, to exploit P(cosmic ray) changing between the decision time and afterwards. So the initial bet could be something like -1 utility now, +12 utility upon taking the right path and not being hit by a cosmic ray. But now since the optimal action is "take the right path along with the bet", the problem setup has been changed, and we can't conclude that the agent's conditional on taking the right path places high probability on getting hit by a cosmic ray (because now the right path is the optimal action), so we can't money-pump with the "+0.5 utility, -12 utility upon taking a cosmic ray hit" bet.

So this seems to dutch-book Death-in-Damascus, not CDTEDT cases in general.

Cooperative Oracles

It actually is a weakening. Because all changes can be interpreted as making some player worse off if we just use standard Pareto optimality, the second condition mean that *more* changes count as improvements, as you correctly state. The third condition cuts down on which changes count as improvements, but the combination of conditions 2 and 3 still has some changes being labeled as improvements that wouldn't be improvements under the old concept of Pareto Optimality.

The definition of an almost stratified Pareto optimum was adapted from this , and was developed specifically to address the infinite game in that post involving a non-well-founded chain of players, where nothing is a stratified Pareto optimum for *all* players. Something isn't stratified Pareto optimal in a vacuum, it's stratified Pareto optimal for a particular player. There's no oracle that's stratified Pareto optimal for all players, but if you take the closure of everyone's SPO sets first to produce a set of ASPO oracles for every player, and take the intersection of all those sets, there are points which are ASPO for everyone.

Beliefs at different timescales

My initial inclination is to introduce as the space of events on turn , and define and then you can express it as .

Beliefs at different timescales

The notation for the sum operator is unclear. I'd advise writing the sum as and using an subscript inside the sum so it's clearer what is being substituted where.

Asymptotic Decision Theory (Improved Writeup)

Wasn't there a fairness/continuity condition in the original ADT paper that if there were two "agents" that converged to always taking the same action, then the embedder would assign them the same value? (more specifically, if , then ) This would mean that it'd be impossible to have be low while is high, so the argument still goes through.

Although, after this whole line of discussion, I'm realizing that there are enough substantial differences between the original formulation of ADT and the thing I wrote up that I should probably clean up this post a bit and clarify more about what's different in the two formulations. Thanks for that.

Asymptotic Decision Theory (Improved Writeup)

in the ADT paper, the asymptotic dominance argument is about thelimitof the agent's action as epsilon goes to 0. This limit is not necessarily computable, so the embedder can't contain the agent, since it doesn't know epsilon. So the evil problem doesn't work.

Agreed that the evil problem doesn't work for the original ADT paper. In the original ADT paper, the agents are allowed to output distributions over moves. I didn't like this because it implicitly assumes that it's possible for the agent to perfectly randomize, and I think randomization is better modeled by a (deterministic) action that consults an environmental random-number generator, which may be correlated with other things.

What I meant was that, in the version of argmax that I set up, if is the two constant policies "take blank box" and "take shiny box", then for the embedder where the opponent runs argmax to select which box to fill, the argmax agent will converge to deterministically randomizing between the two policies, by the logical inductor assigning very similar expected utility to both options such that the inductor can't predict which action will be chosen. And this occurs because the inductor outputting more of "take the blank box" will have converge to a higher expected value (so argmax will learn to copy that), and the inductor outputting more of "take the shiny box" will have converge to a higher expected value (so argmax will learn to copy that).

The optimality proof might be valid. I didn't understand which specific step you thought was wrong.

So, the original statement in the paper was

It must then be the case that for every . Let be the first element of in . Since every class will be seperated by at least in the limit, will eventually be a distribution over just . And since for every , , by the definition of it must be the case that .

The issue with this is the last sentence. It's basically saying "since the two actions and get equal expected utility in the limit, the total variation distance between a distribution over the two actions, and one of the actions, limits to zero", which is false

And it is specifically disproved by the second counterexample, where there are two actions that both result in 1 utility, so they're both in the same equivalence class, but a probabilistic mixture between them (as converges to playing, for all ) gets less than 1 utility.

Consider the following embedder. According to this embedder, you will play chicken against ADT-epsilon who knows who you are. When ADT-epsilon considers this embedder, it will always pass the reality filter, since in fact ADT-epsilon is playing against ADT-epsilon. Furthermore, this embedder gives NeverSwerveBot a high utility. So ADT-epsilon expects a high utility from this embedder, through NeverSwerveBot, and it never swerves.

You'll have to be more specific about "who knows what you are". If it unpacks as "opponent only uses the embedder where it is up against [whatever policy you plugged in]", then NeverSwerveBot will have a high utility, but it will get knocked down by the reality filter, because if you converge to never swerving, will converge to 0, and the inductor will learn that so it will converge to assigning equal expected value to and, and converges to 1.

If it unpacks as "opponent is ADT-epsilon", and you converge to never swerving, then argmaxing will start duplicating the swerve strategy instead of going straight. In both cases, the argument fails.

I found a paper about this exact sort of thing. Escardo and Olivia call that type signature a "selection functional", and the type signature (A→B)→B is called a "quantification functional", and there's several interesting things you can do with them, like combining multiple selection functionals into one in a way that looks reminiscent of game theory. (ie, if ϵ has type signature (A→C)→A, and δ has type signature (B→C)→B, then ϵ⊗δ has type signature ((A×B)→C)→(A×B).