My take on higher-order game theory

Hey Nisan. Check the following passage from Domain Theory (Samson Abramsky and Achim Jung). This might be helpful for equipping with an appropriate domain structure. (You mention [JP89] yourself.)

We should also mention the various attempts to define a probabilistic version of the powerdomain construction, see [SD80, Mai85, Gra88, JP89, Jon90].
[SD80] N. Saheb-Djahromi. CPO’s of measures for nondeterminism. Theoretical Computer Science, 12:19–37, 1980.
[Mai85] M. Main. Free constructions of powerdomains. In A. Melton, editor, Mathematical Foundations of Programming Semantics, volume 239 of Lecture Notes in Computer Science, pages 162–183. Springer Verlag, 1985.
[Gra88] S. Graham. Closure properties of a probabilistic powerdomain construction. In M. Main, A. Melton, M. Mislove, and D. Schmidt, editors, Mathematical Foundations of Programming Language Semantics, volume 298 of Lecture Notes in Computer Science, pages 213–233. Springer Verlag, 1988.
[JP89] C. Jones and G. Plotkin. A probabilistic powerdomain of evaluations. In Proceedings of the 4th Annual Symposium on Logic in Computer Science, pages 186–195. IEEE Computer Society Press, 1989.
[Jon90] C. Jones. Probabilistic Non-Determinism. PhD thesis, University of Edinburgh, Edinburgh, 1990. Also published as Technical Report No. CST63-90.

During my own incursion into agent foundations and game theory, I also bumped into this exact obstacle — namely, that there is no obvious way to equip $Δ$ with a least-fixed-point constructor ${Fix}_{X}^{Δ} : (X \to Δ X) \to Δ X$ . In contrast, we can equip $P$ with a LFP constructor ${Fix}_{X}^{P} : (X \to P X) \to P X, g \mapsto {x \in X : x \in g (x)}$ .

One trick is to define ${Fix}_{X}^{Δ} (g)$ to be the distribution $π \in Δ X$ which maximises the entropy $H (π)$ subject to the constraint $g^{Δ} (π) = π$ .

A maximum entropy distribution $π^{*}$ exists, because —
- For $g : X \to Δ X$ , let $g^{Δ} : Δ X \to Δ X$ be the lift via the $Δ$ monad, and let $G = {π \in Δ X | g^{Δ} (π) = π}$ be the set of fixed points of $g^{Δ}$ .
- $Δ X$ is Hausdorff and compact, and $g^{Δ} : Δ X \to Δ X$ is continuous, so $G = {π \in Δ X : π = g^{Δ} (π)}$ is compact.
- $H : Δ X \to R$ is continuous, and $G \subseteq Δ X$ is compact, so $H$ obtains a maximum $π^{*}$ in $G$ .
Moreover, $π^{*}$ must be unique, because —
- $G$ is a convex set, i.e. if $π_{1} = g^{Δ} (π_{1})$ and $π_{2} = g^{Δ} (π_{2})$ then $λ_{1} π_{1} + λ_{2} π_{2} = g^{Δ} (λ_{1} π_{1} + λ_{2} π_{2})$ for all $λ_{1} + λ_{2} = 1$ .
- $H : Δ X \to R$ is strictly concave, i.e. $H (λ_{1} π_{1} + λ_{2} π_{2}) \geq λ_{1} H (π_{1}) + λ_{2} H (π_{2})$ for all $λ_{1} + λ_{2} = 1$ , and moreover the inequality is strict if $π_{1} \neq π_{2}$ and $λ_{1}, λ_{2} > 0$ .
- Hence if $π_{1}^{*}, π_{2}^{*} \in G$ both obtain the maximum entropy, then $π_{1}^{*} \neq π_{2}^{*} ⟹ H (0.5 π_{1} + 0.5 π_{2}) > 0.5 H (π_{1}) + 0.5 H (π_{2})$ , a contradiction.

The justification here is the Principle of Maximum Entropy:

Given a set of constraints on a probability distribution, then the “best” distribution that fits the data will be the one of maximum entropy.

More generally, we should define ${Fix}_{X}^{Δ} (g)$ to be the distribution $π \in Δ X$ which minimises cross-entropy $D_{K L} (π | | π_{0})$ subject to the constraint $π = g (π)$ , where $π_{0}$ is some uninformed prior such as Solomonoff. The previous result is a special case by considering $π_{0}$ to be the uniform prior. The proof generalises by noting that $D_{K L} (- | | π_{0}) : Δ X \to R$ is continuous and strictly convex. See the Principle of Minimum Discrimination.

Ideally, we'd like ${Fix}^{P}$ and ${Fix}^{Δ}$ to "coincide" modulo the maps $Supp : Δ X \to P X$ , i.e. $Supp ({Fix}^{Δ} (g)) = {Fix}^{P} (Supp \circ g)$ for all $g : X \to Δ X$ . Unfortunately, this isn't the case — if $g : H \mapsto 0.5 \cdot | H ⟩ + 0.5 \cdot | T ⟩, T \mapsto | T ⟩$ then ${Fix}^{P} (Supp \circ g) = {H, T}$ but $Supp ({Fix}^{Δ} (g)) = {T}$ .

Alternatively, we could consider the convex sets of distributions over $X$ .

Let $C (X)$ denote the set of convex sets of distributions over $X$ . There is an ordering $\leq$ on $C (X)$ where $A \leq B ⟺ A \supseteq B$ . We have a LFP operator ${Fix}_{X}^{C} : (X \to C X) \to C X$ via $g \mapsto ⋃ {S \in C X : g_{X}^{C} (S) = S}$ where $g^{C} : C X \to C X, S \mapsto {\sum_{i = 1}^{n} α_{i} \cdot π_{i} | π_{i} \in g (x_{i}), \sum_{i = 1}^{n} α_{i} | x_{i} ⟩ \in S}$ is the lift of $g : X \to C X$ via the $C$ monad.

Actually, we want the set of nonempty sets of actions, so an agent can express indifference between actions. ↩︎
Really we want $c_{n}$ and $d_{n}$ to be maximal elements of the poset $Δ A_{n}$ such that $c_{n} ⊒ a_{n + 1} (d_{n})$ and $d_{n} ⊒ b_{n + 1} (c_{n})$ . See the section on technical details. ↩︎
Actually you can't compute an argmax over a function of a continuous variable. The best you can do is a distribution that's biased towards higher-payoff outcomes, like quantilization or best-of- $k$ sampling. I'll write more about this in the future. ↩︎

[-]Cleo Nardo2y20

[-]Nisan2y10

Thanks! For convex sets of distributions: If you weaken the definition of fixed point to , then the set ${S \in C X : g^{C} (S) = S}$ has a least element which really is a least fixed point.

[-]romeostevensit4y20

Tangential, but did you ever happen to read statistical physics of human cooperation?

[-]Nisan4y10

No, I just took a look. The spin glass stuff looks interesting!

[-]romeostevensit4y10

Are we talking about the same thing?

https://www.sciencedirect.com/science/article/am/pii/S0370157317301424

Yep, I skimmed it by looking at the colorful plots that look like Ising models and reading the captions. Those are always fun.

[-]Charlie Steiner4y10

I have a question about this entirely divorced from practical considerations. Can we play silly ordinal games here?

If you assume that the other agent will take the infinite-order policy, but then naively maximize your expected value rather than unrolling the whole game-playing procedure, this is sort of like . So I guess my question is, if you take this kind of dumb agent (that still has to compute the infinite agent) as your baseline and then re-build an infinite tower of agents (playing other agents of the same level) on top of it, does it reconverge to $A_{\infty}$ or does it converge to some weird $A_{ω 2}$ ?

I think you're saying , right? In that case, since $A_{0}$ embeds into $A_{ω}$ , we'd have $A_{ω + 1}$ embedding into $A_{ω}$ . So not really a step up.

If you want to play ordinal games, you could drop the requirement that agents are computable / Scott-continuous. Then you get the whole ordinal hierarchy. But then we aren't guaranteed equilibria in games between agents of the same order.

I suppose you could have a hybrid approach: Order $ω + 1$ is allowed to be discontinuous in its order- $ω$ beliefs, but higher orders have to be continuous? Maybe that would get you to $ω 2$ .

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

18

My take on higher-order game theory

18

Multiple levels of strategic thinking

Fixed points

The space of all agents

Playing a game

Some agents

Summary

Infinity

Correlated strategies

Technical details