When EDT=CDT, ADT Does Well

Diffractor

Back in my first post about ADT, I conjectured that $A D T$ might achieve sublinear regret relative to argmaxing for the true embedder, if some extra condition was imposed.

The Problem:

In the form of $A D T$ stated in that post, the game of chicken is a counterexample to Conjecture 1. In short, the "delusion" embedder $F$ (where the opponent is an identical $A D T$ agent) has argmax converge to going straight (because it will, in expectation, yield $0.81$ utility, while other strategies yield less utility.) Further, if the agent uses the delusion embedder less than $10 %$ of the time, then $E (F (a m^{F}))$ will start being greater than $E (E (a m^{E}))$ (where $E$ is the embedder that plugs the chosen action into both your action and your opponent's action), so it will start using the delusion embedder more often. Therefore, $10 %$ of the time, the $A D T$ agent rams into itself because it uses the delusion embedder.

Now, the reason why this embedder can't be removed is interesting. We can consider the agent to be reasoning as follows: "I don't know whether I'll pick the delusion embedder or the copy embedder. There's a $90 %$ chance that an $A D T$ agent will pick the copy embedder $E$ , in which case $F (a m^{F})$ will score highly, because $a m^{F}$ will just go straight, and $a m^{E}$ picks swerving with $90 %$ probability. Also, $E (F (A D T))$ is around $0.729$ , because about $90 %$ of the time, my foe picks the copy embedder and so do I, and about $10 %$ of the time, my foe picks the delusion embedder and so do I, and we ram into each other."

However, note that $E (F (a m^{F}))$ (approximately $0.81$ ) systematically deviates from $E (F (A D T) | A D T = a m^{F})$ (approximately $0$ ). This is pretty much Abram's criterion for CDT and EDT diverging. Plugging an action into the embedder tends to give a different result than plugging yourself into the embedder and conditioning on you taking that action. The second quantity is a more accurate estimate of the utility you can expect to get. However, you can't just use it in place of $E (F (a m^{F}))$ because the conditional may give nonsensical answers if you almost certainly don't take the action, because conditionals on improbable events are less accurate. To rephrase this, using the conditional is vulnerable to the futarchy hack, where a trader with a lot of wealth bets on conditional contracts driving down the value of $F (A D T) | A D T = a m^{F}$ , so the $A D T$ agent doesn't use embedder $F$ , so $0$ dollars are lost on the conditional contracts, and the wealthy trader can repeat the same move forevermore and unilaterally prevent ever trying out embedder $F$ .

The Solution:

Is there some way to address this? Yes. As stated in the comments of the last post on ADT, it is possible to improve the reality filter by switching from $| E_{t} (E_{t} (A D T_{t})) - E_{t} (U_{t}) | < δ_{t}$ to (using $F_{t}^{*}$ as an abbreviation for $A D T_{t} = a m_{t}^{F}$ , because this means that embedder $F$ was chosen)

$\sum_{F \in E} P_{t} (F_{t}^{*}) \cdot | E_{t} (E_{t} (A D T_{t}) | F_{t}^{*}) - E_{t} (U_{t} | F_{t}^{*}) | < δ_{t}$

In short, the conditionals may be inaccurate, but if they are inaccurate, then $P_{t} (F_{t}^{*})$ must be low, so the expected values being inaccurate is compensated for by being multiplied by a number close to zero.

It is possible to use the same sort of trick on the divergence between plugging in an action and plugging in yourself and conditioning on an action, and require

$P_{t} (E_{t}^{*}) \cdot | E_{t} (E_{t} (a m_{t}^{E})) - E_{t} (E_{t} (A D T_{t}) | E_{t}^{*}) | < δ_{t}^{'}$

for the embedder to be considered. This is a $C D T = E D T$ filter. Note that this condition isn't a hard constraint on all embedders, it just applies to embedders that are chosen with non-negligible probability. If a particular embedder is definitely not taken, the multiplication by a near-zero term will make the embedder pass the filter, and $E_{t} (E_{t} (a m_{t}^{E}))$ (the $C D T$ utility) will be used as the utility for the embedder, since there's no conditional utility to defer to. However, if an embedder has non-negligible probability of being taken, and passes this filter, then $E_{t} (E_{t} (a m_{t}^{E})) ≃ E_{t} (E_{t} (A D T_{t}) | E_{t}^{*})$ , which is pretty much equivalent to using the $E D T$ utility.

With the improved reality-filter, and the $C D T = E D T$ filter, we can arrive at a proof of Conjecture 1, modulo some extra conditions.

Required Conditions:

First, we must assume that the hypothesis space contains a "true environment" $E$ , which is defined as follows. First, it passes the reality-filter (ie, the reality-filter converges to $1$ ). Second, it passes the $C D T = E D T$ filter (that converges to $1$ ). Note that if the series of utilities seen is acquired by feeding $E$ with a uniformly random bitstring on each turn, it will pass the reality-filter by an adaptation of Theorem 1 from the last post, by Kelly-betting on the exploitable difference between $E_{t} (E_{t} (A D T_{t}) | F_{t}^{*})$ and $E_{t} (U_{t} | F_{t}^{*})$ , where the timely appearance of $U_{t}$ and the actual action in the deductive process lets the conditional bets be resolved quickly. The second condition isn't acquired by default, but it appears that in the known toy examples games where $C D T \neq E D T$ , there is another environment that is equally accurate where the choice of action influences the opponent and $C D T = E D T$ in that environment.

Second, it's possible that the environment could be a perverse environment with clustering behavior, so $\frac{1}{t} \cdot \sum_{i \leq t} E_{i} (E_{i} (a m_{i}^{E})) - E_{i} (a m_{i}^{E})$ doesn't limit to $0$ , because feedback on what $E (a m^{E})$ is may be quite sparse. Therefore, we have to assume that the true environment (whatever it may happen to be) is sufficiently "nice", in the sense that the time-average of the difference between the expectation of $E (a m^{E})$ and $E (a m^{E})$ itself limits to zero.

The third condition is very similar, that $\frac{1}{t} \sum_{i \leq t} E_{i} (U_{i}) - U_{i}$ limits to zero. It's more plausible than the first assumption, because (an interval containing) the true value of $U$ appears in the deductive process in a timely manner by assumption, but I haven't found a way to implement Kelly-betting to exploit this, because the traders don't necessarily have access to the deductive process, or know their own worst-case value. It should be noted that the traders not knowing their own worst-case value is the property that necessitates emulating the entire logical inductor to implement budgeting, in the key $ϵ$ -ROI lemma of the logical induction paper. Therefore, it makes sense to stipulate that traders are given their own worst-case value as an advice string, to streamline budgeting, and if the logical inductor is augmented in this way, this condition is attained for free via a Kelly-betting argument.

Final Proof:

We will get a necessary and simple lemma out of the way first.

Lemma 1: When there is a finite set of sequences of events $E$ , which are mutually exclusive, and a sequence of LUV's $U_{t}$ , then $E_{t} (U_{t}) ≃_{t} \sum_{F \in E} P_{t} (F_{t}^{*}) \cdot E_{t} (U_{t} | F_{t}^{*})$

Proof: By the definition of $E_{t} (U_{t} | F_{t}^{*})$ , this is equivalent to the statement that $E_{t} (U_{t}) ≃_{t} \sum_{F \in E} E_{t} (U_{t} \cdot [F_{t}^{*}])$ , where the brackets are the Iverson brackets, which is just the indicator function that is $1$ if the statement is true, and $0$ if the statement is false.

Now, notice that the bounded combination sequences $\sum_{0 \leq i < t} \frac{1}{t} ┌ U_{t} > \frac{i}{t} ┐$ and $\sum_{0 \leq i < t} \sum_{F \in E} \frac{1}{t} ┌ U_{t} \cdot [F_{t}^{*}] > \frac{i}{t} ┐$ have provably equal values for all $t$ , so by Theorem 4.5.4 in the logical induction paper, Affine Provability Induction, the prices on these sequences will limit to being equal. By the definition of expected value, the prices on these sequences are the prices of $E_{t} (U_{t})$ and $\sum_{F \in E} E_{t} (U_{t} \cdot [F_{t}^{*}])$ , respectively.

Now to prove sublinear regret.

Theorem 1: If there is an embedder $E$ in the hypothesis class of an $A D T$ agent which is up against an environment $U$ such that

1: $\frac{1}{t} \sum_{i \leq t} E_{t} (U_{t}) - U_{t} ≃_{t} 0$ (utility is learned)

2: $\frac{1}{t} \sum_{i \leq t} E_{t} (E_{t} (a m_{t}^{E})) - E_{t} (a m_{t}^{E}) ≃_{t} 0$ (argmax behavior of $E$ is learned)

3: $i n d_{δ_{t}} (\sum_{F \in E} P_{t} (F_{t}^{*}) \cdot | E_{t} (E_{t} (A D T_{t}) | F_{t}^{*}) - E_{t} (U_{t} | F_{t}^{*}) | < δ_{t}) ≃_{t} 1$

( $E$ passes the reality filter)

4: $i n d_{δ_{t}^{'}} (P_{t} (E_{t}^{*}) \cdot | E_{t} (E_{t} (a m_{t}^{E})) - E_{t} (E_{t} (A D T_{t}) | E_{t}^{*}) | < δ_{t}^{'}) ≃_{t} 1$

( $E$ passes the $E D T = C D T$ filter),

then the $A D T$ agent on the true environment $U$ has sublinear regret relative to the argmax agent on $E$ , ie, ${limsup}_{t \to \infty} \frac{1}{t} \sum_{i \leq t} E_{t} (a m_{t}^{E}) - U_{t} \leq 0$ .

Proof: To begin with, by applying assumptions 1 and 2, we only need to prove that ${limsup}_{t \to \infty} \frac{1}{t} \sum_{i \leq t} E_{t} (E_{t} (a m_{t}^{E})) - E_{t} (U_{t}) \leq 0$ . In fact, we will prove a stronger result, that $E_{t} (U_{t}) \geq_{t} E_{t} (E_{t} (a m_{t}^{E}))$ (which immediately implies the desired result).

Assume this is false, so infinitely often $E_{t} (E_{t} (a m_{t}^{E})) - ϵ > E_{t} (U_{t})$ for some $ϵ$ . Also, let $D_{t}^{ϵ}$ be the set of embedders with greater than $\frac{ϵ}{3 | E |}$ probability of being chosen on turn $t$ . By Lemma 1,

$E_{t} (U_{t}) ≃_{t} \sum F \in E P_{t} (F_{t}^{*}) \cdot E_{t} (U_{t} | F_{t}^{*})$

$\sum F \in E P_{t} (F_{t}^{*}) \cdot E_{t} (U_{t} | F_{t}^{*}) \geq \sum F \in D_{t}^{ϵ} P_{t} (F_{t}^{*}) \cdot E_{t} (U_{t} | F_{t}^{*})$

In case 1, $\exists F \in D_{t}^{ϵ}$ that clearly fails the filters (ie, one of the two filters has the quantity inside of the filter be above $3 δ_{t}$ or $3 δ_{t}^{'}$ , as the case may be). Also $E_{t} (E_{t} (a m_{t}^{E})) > \frac{ϵ}{3}$ . Then in this case, it is possible for a trader to bet against $F$ being selected, because clearly failing a filter can be picked out by a continuous indicator function. $F$ won't be selected because its value is $0$ , due to failing both filters, and $E$ passes both filters by conditions 3 and 4, and claims that it can attain value bounded away from $0$ . This bet allows the trader to pick up a free $\frac{ϵ}{3 | E |}$ dollars or more (due to the probability of $F$ being selected being bounded away from $0$ due to being in $D_{t}^{ϵ}$ ). Because this is a logical inductor, this case can only occur finitely many times.

In case 2, $\exists F \in D_{t}^{ϵ}$ that clearly fails a filter, also $E_{t} (E_{t} (a m_{t}^{E})) \leq \frac{ϵ}{3}$ . In that case the regret is at most $\frac{ϵ}{3}$ . Therefore, all but finitely many of the times where $E_{t} (E_{t} (a m_{t}^{E})) - ϵ > E_{t} (U_{t})$ occur in case 3.

Now, let $D_{t}^{^{'} ϵ} \subseteq D_{t}^{ϵ}$ be the set of plausible environments that don't clearly fail a filter. In case 3, these two sets are equal. By the reality-filters not being clearly failed, and the reality-filters enforcing the approximate equality of $E_{t} (U_{t} | F_{t}^{*})$ and $E_{t} (F_{t} (A D T_{t}) | F_{t}^{*})$ , for all $F \in D_{t}^{^{'} ϵ}$ and $δ_{t}$ limiting to $0$ , we get that

$\sum F \in D_{t}^{^{'} ϵ} P_{t} (F_{t}^{*}) \cdot E_{t} (U_{t} | F_{t}^{*}) ≃_{t} \sum F \in D_{t}^{^{'} ϵ} P_{t} (F_{t}^{*}) \cdot E_{t} (F_{t} (A D T_{t}) | F_{t}^{*})$

Then, by the $C D T = E D T$ filters not being clearly failed, which enforce the approximate equality of $E_{t} (F_{t} (A D T_{t}) | F_{t}^{*})$ and $E_{t} (F_{t} (a m_{t}^{E}))$ , we get that

$\sum F \in D_{t}^{^{'} ϵ} P_{t} (F_{t}^{*}) \cdot E_{t} (F_{t} (A D T_{t}) | F_{t}^{*}) ≃_{t} \sum F \in D_{t}^{^{'} ϵ} P_{t} (F_{t}^{*}) \cdot E_{t} (F_{t} (a m_{t}^{F}))$

Now, by conditions 3 and 4, the filters for $E$ must limit to $1$ , so on all but finitely many turns, for all $F \in D_{t}^{^{'} ϵ}$ , $E_{t} (F_{t} (a m_{t}^{F})) > E_{t} (E_{t} (a m_{t}^{E})) - \frac{ϵ}{3}$ (otherwise it is possible to bet against $F$ being chosen and get free money by the argument from case 1). Thus, we get that

$\sum F \in D_{t}^{^{'} ϵ} P_{t} (F_{t}^{*}) \cdot E_{t} (F_{t} (a m_{t}^{F})) \geq_{t} \sum F \in D_{t}^{^{'} ϵ} P_{t} (F_{t}^{*}) \cdot E_{t} (E_{t} (a m_{t}^{E})) - \frac{ϵ}{3}$

Because we are in case 3 where $D_{t}^{^{'} ϵ} = D_{t}^{ϵ}$ , and the probabilities of the embedders being selected approximately add up to $1$ , and the total probability mass outside of $D_{t}^{ϵ}$ is at most $\frac{ϵ}{3}$ by the definition of $D_{t}^{ϵ}$ , we get that

$\sum F \in D_{t}^{^{'} ϵ} P_{t} (F_{t}^{*}) \cdot E_{t} (E_{t} (a m_{t}^{E})) - \frac{ϵ}{3} \geq_{t} E_{t} (E_{t} (a m_{t}^{E})) - \frac{2 ϵ}{3}$

And putting it all together, we get that $E_{t} (U_{t}) \geq_{t} E_{t} (E_{t} (a m_{t}^{E})) - \frac{2 ϵ}{3}$ , which contradicts the assumption that $E_{t} (U_{t}) < E_{t} (E_{t} (a m_{t}^{E})) - ϵ$ infinitely often.

Therefore, $E_{t} (U_{t}) \geq_{t} E_{t} (E_{t} (a m_{t}^{E}))$ holds, which implies the theorem by assumptions 1 and 2.

Additional Discussion and Generalizing to Expanding Hypothesis Spaces:

There's an interesting feature of this. In the proof, only part of the reality-filter was actually used, and the part that was used can be combined with the $C D T = E D T$ filter to yield a single filter that suffices to prove sublinear regret on the true environment, which is:

$P_{t} (E_{t}^{*}) \cdot | E_{t} (E_{t} (a m_{t}^{E})) - E_{t} (U_{t} | E_{t}^{*}) | < δ_{t}^{'}$

However, this doesn't enforce that embedders accurately predict what happens when other embedders are selected by $A D T$ , so it glosses over important information which can be used to rule out an embedder.

The obvious next question is whether it's possible to generalize the proof to a space of hypotheses that expands over time. We might hope to get a result that, for all computable environments that are "sufficiently nice", there is an $A D T$ agent with a sufficiently slow-growing hypothesis space that will eventually learn to act in accordance with the true environment. This runs into several obstacles.

Obstacle 1: The proof of Lemma 1 fails because the batch of statements of the form $\sum_{0 \leq i < t} \sum_{F \in E} \frac{1}{t} ┌ U_{t} \cdot [F_{t}^{*}] > \frac{i}{t} ┐$ stops being a bounded combination sequence (because, all in all, the combination is composed of $| E |$ shares in total, which isn't bounded when $E$ grows over time).

Obstacle 2: The argument for Case 1 occuring finitely often (that failure allows you to pump free money out and if this recurs infinitely often you get infinite money) fails, because the amount of money you can pump out is $\frac{ϵ}{3 | E |}$ , but when $| E |$ grows over time, the money you can pump out lessens over time, in a way that may force convergence. I suspect it's possible to address this by shooting for a proof that even when the hypothesis space grows over time, it's possible to exploit the market unless the total probability mass on embedders that clearly fail their filters limits to zero.

Obstacle 3: The asymptotic equality linking $\sum_{F \in D_{t}^{^{'} ϵ}} P_{t} (F_{t}^{*}) \cdot E_{t} (U_{t} | F_{t}^{*})$ and $\sum_{F \in D_{t}^{^{'} ϵ}} P_{t} (F_{t}^{*}) \cdot E_{t} (F_{t} (a m_{t}^{F}))$ occurs because the difference between those two is bounded by $| E | (δ_{t} + δ_{t}^{'})$ , which limits to zero because $δ_{t}$ and $δ_{t}^{'}$ limit to zero. When $| E |$ grows over time, it requires $δ_{t}$ and $δ_{t}^{'}$ to shrink faster to get this to work out. So if you're adding one new hypothesis each turn, then $δ_{t}$ and $δ_{t}^{'}$ must shrink faster than $\frac{1}{t}$ . And if $δ_{t}$ and $δ_{t}^{'}$ shrink too fast, then the true environment can't pass the reality filter either. So this means $| E |$ can't grow too fast.

Obstacle 4: This appears to be the largest obstacle. It's really hard to guarantee that the true environment passes the reality-filter. Even with a finite hypothesis space, the reality-filter "score" does limit to zero (although I haven't checked the proof particularly well), but going for any result along the lines of "this rate of $δ_{t}$ approaching zero/level of strictness for the reality-filter is passed by the true environment" causes the proof to fail. It might even be uncomputably slow for all I know. As far as I can tell, this problem gets even worse when the size of the hypothesis class grows over time, and it might cause a failure of the original proof of the reality-filter "score" limiting to zero. It seems like the hard part is being shoved into the true environment being able to pass the filters in the first place. And limits on how fast $δ_{t}$ can converge to zero translate directly into limits on how fast $| E |$ can grow by the third obstacle.

[-]abramdemski7y20

I don't currently know of any example where this achieves better performance than LIDT. The raw action-expectations of the logical inductor, used as an embedder, will always pass the reality filter and the CDT=EDT filter. Any embedder differing from those expectations would have to differ on the expected utility of an action other than the one it recommended, in order to pass the CDT=EDT filter. So the expected utility of alternate embedders, according to the those embedders themselves, can only be lower than that of the LIDT-like embedder.

There seems to be a deep connection between the CDT=EDT assumption and LIDT-like reasoning. In the Bayes-net setting where I prove CDT=EDT, I assume self-knowledge of mixed strategy (aka mixed-strategy ratifiability) and I assume one can implement randomized strategies without interference (mixed-strategy implementability). LIDT has the first property, since a logical inductor will learn a calibrated estimate of the action probabilities in a given situation. In problems where the environment is entangled with the randomization the agent uses, like troll bridge, LIDT can do quite poorly. Part of the original attraction of the CDT=EDT philosophy for me was the way it seems to capture how logical induction naturally wants to think about things.

I don't think loss relative to the argmax agent for the true environment is a very good optimality notion. It is only helpful in so far as the argmax strategy on the true environment is the best you can do. Other agents may perform better, in general. For example, argmax behaviour will 2-box in Newcomb so long as the predictor can't predict the agent's exploration. (Say, if the predictor is predicting you with your same logical inductor.) Loss relative to other agents more generally (like in the original ADT write-up) seems more relevant.

AI ALIGNMENT FORUM
AF