Logical Inductors Converge to Correlated Equilibria (Kinda)

AI ALIGNMENT FORUM
AF

Logical Inductors Converge to Correlated Equilibria (Kinda) — AI Alignment Forum

Logical inductors of "similar strength", playing against each other in a repeated game, will converge to correlated equilibria of the one-shot game, for the same reason that players that react to the past plays of their opponent converge to correlated equilibria. In fact, this proof is essentially just the proof from Calibrated Learning and Correlated Equilibrium by Forster (1997), adapted to a logical inductor setting.

However, there are two complications. The first complication is large differences in power between the logical inductors. To be specific, imagine a logical inductor playing matching pennies against another logical inductor of great strength that can simulate what the highest-valued action will be (though not the exploration steps). Also, the outcome {heads,heads}is assigned $ϵ$ utility, while {tails,tails}is 0 utility. This is essentially a Death-in-Damascus scenario. The unique Nash equilibria is randomizing at approximately 50/50 odds, but in repeated plays of this game with density-zero exploration, the best move is to just play heads to get the $ϵ$ utility, because the inductor can't fool its opponent except on days where it explores, which happen with asymptotic frequency 0. So sufficiently large differences in power between the inductors can lead to results that aren't Nash or correlated equilibria.

As such, in order to guarantee correlated equilibria, we will need to make a "no subjective dependencies" assumption. More precisely, in the limit, conditioning on your own move can't shift your probabilities of your opponent's move. The $ϵ$ -matching pennies against a powerful predictor game fails this, because the expected utility of the agent (conditional on it playing "tails") must be $ϵ$ (if it was less than $ϵ$ , the utility would go up because of exploration steps where it fools the opponent, so the limiting behavior is playing "tails" with probability limiting to 0, and $ϵ$ of those plays of "tails" are ones where it succesfully fools the opponent due to exploration). So, conditional on playing "tails", the probability of the opponent playing "tails" is about $1 - ϵ$ . Conditional on playing "heads", the probability of the opponent playing "tails" is 0.

The second complication is that all Nash equilibria are correlated equilibria, so maybe logical inductors converge to Nash equilibria in most cases. In general, I don't expect this, because it's possible for a game to have a unique Nash equilibria s.t. playing best-responses to the prediction of the opponent doesn't converge to that equilibrium, because the equilibrium is unstable (small divergences from the equilibrium mean that the incentive gradients point further away from the equilibrium in all directions). My next direction after this will be to attempt to adapt one of these proofs to the logical inductor setting.

For the moment, an (admittedly rather contrived) example of the game of Chicken shows that it's possible to converge to a correlated equilibrium that isn't a Nash equilibrium, and there are intuitive reasons to expect this to happen generally. It's still a Nash equilibrium of the infinitely repeated game, however.

On to the proof.

#Notation

$P_{h}$ is player $h$ . There are finitely many players. (the letters $i$ and $j$ are taken already)

$S_{h}$ is the finite set of actions for $P_{h}$ . We will assume that all players only have finitely many actions available.

$S := \prod_{h \in H} S_{h}$ , it is the set of joint outcomes.

$i j \in S$ is a specific outcome, given by $P_{1}$ selecting action $j$ , and everyone else in the game selecting outcome $i$ . It will be convenient to view the game as a 2-player game, with player 1 vs. player "everyone else", whose moves are indexed by $i$ .

$a_{j} \in S_{1}$ is the j'th action for player 1.

$P : S \to [0, 1]$ is some assignment of prices to all possible outcomes of the game, interpreted as a $n \times m$ matrix, as in the previous post.

$P := [0, 1]^{S}$ . It is the space of price matrices. A point in this space is denoted by $p$ , while $P$ is the corresponding matrix.

$P^{Δ} \subset P$ is the set of all price matrices that actually correspond to a probability distribution. The prices of the logical inductor will converge to this set.

$≃_{t}$ denotes two equations limiting to each other as the variable $t$ limits to infinity.

$i n d (i j, k)$ is an indicator variable that's 1 when outcome $i j$ occurs on turn $k$ , and 0 otherwise.

$P^{t}$ is the matrix that corresponds to the empirical frequency, up to time $t$ , of the various joint outcomes. $P_{i j}^{t} := \frac{1}{t} \sum k \leq t i n d (i j, k)$

$| P_{j}^{t} |$ is the empirical frequency up to time $t$ of player $i$ taking action $a_{j}$ , obtained by summing over row $j$ in the matrix $P^{t}$ .

$C$ is a "fuzzy cell". It is a fuzzy subset of $P$ , that is implemented by some expressible feature that takes the prices of the sentences in $S$ as input, and outputs a number in [0,1].

$α_{C} (k)$ is the number given by taking the prices of the logical inductor at time $k$ , and feeding them into the expressible feature that corresponds to the fuzzy cell $C$ . It yields the degree of membership in the fuzzy set C, of the point that corresponds to the prices. By a slight abuse of notation, $α_{C} (p)$ can also be defined similarly.

$⋄$ is a set of fuzzy cells $C$ over $P$ , subject to six restrictions.

There are finitely many fuzzy cells.
All cells are convex.
$\forall p \in P : \sum_{C \in ⋄} α_{C} (p) = 1$
There are two special cells, defined by the sum over all entries of $P$ being less than $1 - ϵ$ , and greater than $1 + ϵ$ , as implemented by continuous indicator functions.
All the other cells must have a "central point" that lies in $P^{Δ}$ .
All these other cells must have all points in their support be within $ϵ$ of that "central point".

(This may seem a bit arbitrary, but it's because logical inductors can't reference a set characterized by a discontinuous indicator function, so we had to use fuzzy sets. Condition 2 is needed to set up a lemma, condition 3 is convenient to separate out and ignore times where the logical inductor probabilities haven't approximately converged yet, and conditions 4 and 5 show up near the end of the proof.)

$B_{a_{j}}$ is the set of fuzzy cells in $⋄$ (except for the two which characterize incoherent probabilities) which contain points (i.e. $α_{C} (p)$ is nonzero) for which the action $a_{j}$ is an evidential best response.

$B_{a_{j}}^{S I}$ is defined as above, except that the cells are subject to the extra restriction that they contain points $p$ which correspond to matrices $P$ that are subjectively independent of player 1. This just means that all the row vectors $P_{j}$ lie in a 1-d subspace, so the probability distribution over what everyone else does is independent of what player 1 does. For these matrices, the causal and evidential notions of best response coincide.

$B_{a_{j}}^{*}$ is the set of points $q$ in $[0, 1]^{\prod_{h \in H, h \neq 1} S_{h}}$ for which $q$ is a probability distribution, and $a_{j}$ is a causal best response. This set is convex for all actions. All points in $B_{a_{j}}^{S I}$ have a corresponding point in $B_{a_{j}}^{*}$ , obtained by taking an arbitrary row vector and renormalizing it to 1.

#Proof

To begin with, the set of points ${P^{t}}_{t \in N^{+}}$ for all $t$ (these are the empirical frequencies) is confined to the bounded space of all possible joint probability distributions $P^{Δ}$ , so by Bolzano-Weierstrass, a convergent subsequence exists.

Consider such a subsequence that converges to some limiting point/joint distribution $P$ . That is,

$\sum i j \in S | P_{i j}^{t_{n}} - P_{i j} | ≃_{t} 0$

Now, fix some action $a_{j}$ such that $| P_{j} | \neq 0$ , that will, from this point on, be assumed to be taken in the specific outcome $i j$ .

To restate the definition of $P_{i j}^{t_{n}}$ ...

$P_{i j}^{t_{n}} = \frac{1}{t_{n}} \sum k \leq t_{n} i n d (i j, k)$

Because the sum over all cells of $α_{C} (k)$ is 1, for all $k$ ,

$= \frac{1}{t_{n}} \sum k \leq t_{n} \sum C \in ⋄ α_{C} (k) \cdot i n d (i j, k)$

Now, the sums can be interchanged to yield

$= \sum C \in ⋄ \frac{1}{t_{n}} \sum k \leq t_{n} α_{C} (k) \cdot i n d (i j, k)$

The two special fuzzy sets that pick out sufficiently incoherent probabilities have finitely many points in them, and by Lemma 6 from the last post, all cells outside of $B_{a_{j}}$ have the inner term limiting to 0.

$≃_{t_{n}} \sum C \in B_{a_{j}} \frac{1}{t_{n}} \sum k \leq t_{n} α_{C} (k) \cdot i n d (i j, k)$

Let $N (C, t_{n})$ be the number of times where the probabilities were in $C$ , $\sum_{k \leq t_{n}} α_{C} (k)$ , and let $ρ (C, i j, t_{n})$ be the fraction of the time up to $t_{n}$ where the probabilities were in $C$ and the outcome $i j$ happened.

$ρ (C, i j, t_{n}) \cdot N (C, t_{n}) = \sum k \leq t_{n} α_{C} (k) \cdot i n d (i j, k)$

Putting all these equalities and asymptotic equalities together, we get

$P_{i j}^{t_{n}} ≃_{t_{n}} \frac{1}{t_{n}} \sum C \in B_{a_{j}} ρ (C, i j, t_{n}) \cdot N (C, t_{n})$

Now, let $c_{i j}$ be the $i j$ 'th entry in the matrix corresponding to the center of the cell C. This is associating a cell with a canonical probability distribution over everyone's joint distribution, and reading out the probability of $i j$ happening.

$\frac{1}{t_{n}} \sum C \in B_{a_{j}} ρ (C, i j, t_{n}) \cdot N (C, t_{n}) = \frac{1}{t_{n}} \sum C \in B_{a_{j}} c_{i j} \cdot N (C, t_{n}) + \frac{1}{t_{n}} \sum C \in B_{a_{j}} (ρ (C, i j, t_{n}) - c_{i j}) \cdot N (C, t_{n})$

Remember that $i j$ is our joint outcome, and remains the same, but the cell that we're reading that coordinate out from varies within the sum.

The empirical fraction of $i j$ instances that occur when the probabilities are in $C$ must limit to be within $ϵ$ of $c_{i j}$ , by convexity of $C$ , calibration of logical inductors, and the cell itself having size $ϵ$ . Thus, by fixing an epsilon/cell size ahead of time, we get that

$P_{i j}^{t_{n}} ≃_{t_{n}} \frac{1}{t_{n}} \sum C \in B_{a_{j}} c_{i j} \cdot N (C, t_{n}) \pm ϵ$

In the limit as $t_{n}$ goes to infinity, because $a_{j}$ is taken with nonzero frequency in the limit,

$\frac{P_{i j}}{| P_{j} |} ≃_{t_{n}} \frac{P_{i j}^{t_{n}}}{| P_{j}^{t_{n}} |}$

Furthermore, as previously established,

$\frac{P_{i j}^{t_{n}}}{| P_{j}^{t_{n}} |} ≃_{t_{n}} \frac{\frac{1}{t_{n}} \sum_{C \in B_{a_{j}}} c_{i j} \cdot N (C, t_{n}) \pm ϵ}{\sum_{i} (\frac{1}{t_{n}} \sum_{C \in B_{a_{j}}} c_{i j} \cdot N (C, t_{n}) \pm ϵ)}$

In the limit of infinitely many cells and $ϵ$ shrinking to 0,

$\frac{P_{i j}}{| P_{j} |} ≃_{t_{n}} \frac{\sum_{C \in B_{a_{j}}} c_{i j} \cdot N (C, t_{n})}{\sum_{C \in B_{a_{j}}} N (C, t_{n}) \sum_{i} c_{i j}}$

Now, in general, we can't go much further than this (I think). However, if we consider the special case where player 1 has roughly equal or greater computational power than all others so all cells outside of $B_{a_{j}}^{S I}$ have an asymptotic probability of 0 (ie, in the limit, the probability distribution over everyone else's moves is independent of player 1's moves), then we can conclude that the limiting point $P$ is a correlated equilibria.The reasoning goes as follows:

Consider the $n$ -length vector where the $i$ 'th entry is given by $\frac{P_{i j}}{| P_{j} |}$ . This vector corresponds to the limiting distribution $P$ over everyone's moves, conditional on the inductor taking move $j$ . Because of what $\frac{P_{i j}}{| P_{j} |}$ equals in the limit, and because of all cells outside of $B_{a_{j}}^{S I}$ limiting to 0, this vector is approximated, in the limit, by the vector

$\sum C \in B_{a_{j}}^{S I} c_{j} \cdot \frac{N (C, t_{n})}{\sum_{C \in B_{a_{j}}^{S I}} N (C, t_{n}) \cdot | c_{j} |}$

As previously mentioned, because all these cells are in $B_{a_{j}}^{S I}$ , and (in the limit as $ϵ$ goes to 0), their center points are also subjectively independent from player 1, each cell center point is associated with a canonical vector in $B_{a_{j}}^{*}$ , specifically, $\frac{c_{j}}{| c_{j} |}$ (we can consider the cells as never having a center point for which an action has probability 0, so this is always well-defined).

Now, we can consider some weighting of all of these vectors, where the weight on the cell vector $\frac{c_{j}}{| c_{j} |}$ is

$\frac{| c_{j} | \cdot N (C, t_{n})}{\sum_{C \in B_{a_{j}}^{S I}} | c_{j} | \cdot N (C, t_{n})}$

It is fairly easy to verify that this sums up to one across all $C \in B_{a_{j}}^{S I}$ . Because it is a weighting of a bunch of vectors in $B_{a_{j}}^{*}$ , which is a convex set, the resulting vector must be in $B_{a_{j}}^{*}$ , the set of distributions over the actions of others for which $a_{j}$ is a causal best response. The $| c_{j} |$ cancels out of the top and bottom, leaving the vector

$\sum C \in B_{a_{j}}^{S I} c_{j} \cdot \frac{N (C, t_{n})}{\sum_{C \in B_{a_{j}}^{S I}} N (C, t_{n}) | c_{j} |}$

which, as previously established, is the vector giving the probability distribution over everyone else's moves conditional on player 1 making the move $a_{j}$ .

Therefore, conditional on player 1 making move $a_{j}$ , the distribution over opponent moves was such that $a_{j}$ was a causal best response.

This same argument can be repeated for all players to conclude that they play a correlated equilibrium if they are all bad enough at predicting each other that all players think they can change their own action without affecting the probability distribution over everyone else's actions.

#Correlated or Nash?

Admittedly, since all Nash equilibria are correlated equilibria, this isn't quite strong enough. As previously mentioned, there are games where it is possible to show that a wide range of strategies do not converge to a Nash equilibrium, and I'll work on that case soon.

However, there is an (admittedly sketchy) argument that shows that logical inductors plausibly converge to correlated equilibria in general.

Consider two logical inductors going up against each other in a game of Chicken. Logical inductor A has a trader with a very large starting budget that buys conditional contracts forcing down the expected utility of "swerve", but only on even-numbered days. Logical inductor B is the same, but for odd-numbered days.

Eventually, both logical inductors will learn that the other inductor is very likely to go straight on the $P$ -generable subsequence of even/odd days, and converge to swerving on those days. This probably continues in a self-enforcing way, even after the traders run out of budget, because inductor A exploring into going straight on odd-numbered days experiences ruin, and vice-versa, so they both learn to get out of the way when the other one is going straight. This is a correlated equilibria, because the long-run frequencies over all joint outcomes don't converge to any of the three Nash equilibria.

However, it is a Nash equilibria of the infinitely repeated game, by the Folk Theorem.

Admittedly there's no proof, but it seems very plausible that this behavior of superstitiously learning to defend/swerve out of the way on $P$ -generable sequences happens in general when logical inductors play each other, and once established and learned, these patterns are stable.