In classical game theory, we characterise agents by a utility function and assume that agents choose options which cause maximal utility. This is a pretty good model, but it has some conceptual and empirical limitations which are particularly troublesome for AI safety.

Higher-order game theory (HOGT) is an attempt to rebuild game theory without appealing to either utility functions or maximisation. I think Higher-Order Game Theory is cool so I'm writing a sequence on it.

I'll try to summarise the relevant bits of the literature, present my own minor extensions, and apply HOGT to problems in AI safety

You're reading the first post! Let's get into it.

The role of argmax

For each set X, let argmaxX:(X→R)→P(X) be the familiar function which receives a function u:X→R and produces the set of element which maximise u. A function like argmaxX is sometimes called a higher-order function or functional, because it receives another function as input.

As you all surely know, argmax plays a central role in classical game theory. Typically we interpret the set X as the agent's options,^{[2]} and the function u:X→R as the agent's task, which assigns a payoffu(x)∈R to each option x∈X. We say an option x∈X is optimal to the agent for the task u:X→R whenever x∈argmaxX(u). Classical game theory is governed by the assumption that agents choose optimal options in whatever task they face, where optimality strictly means utility-maximisation.

Definition 1 (provisional): Let X be any set of options. A task is any function u:X→R. An option x∈X is optimal for a task u:X→R if and only if x∈argmaxX(u).

Due to the presence of the powerset operatorP in argmaxX:(X→R)→P(X), this model of the agent is possibilistic — for each task u:X→R, our model says which options are possibly chosen by agent. The model doesn't say which options are probably chosen by the agent — for that we'd need a function (X→R)→Δ(X). Nor does the model say which options are actually chosen by the agent — for that we'd need a function (X→R)→X.^{[3]}

How many options are optimal for the task?

Example

Typically there'll be a unique optimal option.

argmaxC(λz∈C(1−|z|2))={0}

Perhaps multiple options will be optimal.

argmaxR(sin)={2πk+π2|k∈Z}

Perhaps no options are optimal, i.e. every option is strictly dominated by another.

argmaxR(exp)=∅

Perhaps every option is optimal, i.e. the task is a constant function.

argmaxZ(λk∈Z.sin(π⋅k))=Z

Exercise 1: Find a set X such that argmaxX(u)=∅ for every function u:X→R.

Generalising the functional

The function argmaxX is a particular way to turn tasks into sets of options, i.e. it has the type-signature (X→R)→P(X). But there are many functions with the same type-signature (see the table below), so a natural question to ask is... What if we replace argmaxX in classical game theory with an arbitrary functional ψ:(X→R)→P(X)?

What we get is higher-order game theory.^{[4]} Surprisingly, we can recover many game-theoretic concepts in this more general setting. We can typically recover the original classical concepts from the more general higher-order concepts by restricting our attention to either ψ=argmaxX or ψ=argminX.

So let's revise our definition —

Definition 2 (provisional): Let X be any set of options. An optimiser is any functional ψ:(X→R)→P(X). A ψ-task is any function u:X→R. An option x∈X is ψ-optimal for a task u:X→R if and only if x∈ψ(u).

When clear from context, I'll just say task and optimal.

In higher-order game theory, we model the agents options with a set X and model their task with a function u:X→R. But (unlike in classical game theory) we're free to model the agent's optimisation with any functional ψ:(X→R)→P(X). I hope to persuade you that this additional degree of freedom is actually quite handy.^{[5]}

Higher-order game theory is governed by the central assumption that agents choose ψ-optimal options in whatever ψ-tasks they face, where ψ is our model of the agent's optimisation. If we observe the agent choosing an option x∈ψ(u) then that would be consistent with our model, and any observation of a choice x∉ψ(u) would falsify our model.^{[6]}

Anyway, here is a table of some functionals and their game-theoretic interpretation —

ψ:(X→R)→P(X)

Remarks

argmin=λu:X→R.{x∈X|∀x′∈X,u(x)≤u(x′)}

This agent will choose an option x∈X which minimises u. In classical game theory, this type of optimiser is typically used to model the adversary to the agent modelled by argmax.

satisfices=λu:X→R.{x∈X|u(x)≥u(s)}

This agent will choose an option x∈X which dominates some fixed option s∈X. The option s is called the anchor point. It might represent the "default" option, or the "do nothing" option, or the "human-approved" option.

According to Herbert Simon, satisficing is an accurate model of human and institutional decision-making.

Utility-satisficers are a kind of mild optimiser, which might be a safer way to build AI than a full-blown utility-maximiser.

argmax-ϵ-slack=λu:X→R.{x∈X|∀x′∈X,u(x)+ϵ≥u(x′)}

This agent will chooses an option x∈X which maximises the function u up to some fixed slack ϵ>0. Such agents behave like argmax except that their utility u(x)∈R is measured with finite precision.

better-than-average=λu:X→R.{x∈X|u(x)≥Ex′∼πu(x′)}

This agent will choose an option x∈X which scores better than the average option, given that the option space is equipped with a distribution π∈Δ(X).

rockS=λu:X→R.S

This agent will choose an option in a fixed subset S⊆X, regardless of the task, e.g. DefectBot and CooperateBot.^{[7]}

Using rockS, we can model a non-agents as a special (degenerate) case of agents.

quantϵ=λu:X→R.{x∈X:Px′∼π[u(x′)≥u(x)]≤ϵ}

This agent will choose an option x∈X in the top ϵ quantile, given that the option space is equipped with a distribution π∈Δ(X).

This is the possibilistic version of Jessica Taylor's quantiliser. Her original probabilistic version is a function (X→R)→ΔX, and so wouldn't count as an optimiser according to Definition 2.^{[8]}

satisficeS=λu:X→R.{x∈X|∀s∈S,u(s)≤u(x)}

This agent will choose an option x∈X which dominates every anchor pointin S⊆X. As special cases, when S=X we get argmaxX, and when S is a singleton set we get Simon's satisficer mentioned before. When the set of anchor points is smaller, then the resulting optimiser is less selective, i.e. more options will be optimal.

threshα=λu:X→R.{x∈X|u(x)≥α}

Exercise 2

Exercise 3

This agent will choose an option in the largest equivalence class, where two options are equivalent if they result in the same payoff.

Generalising the payoff space.

Now let's generalise the payoff space to any set R, not only R. We will think of the elements of R as payoffs in a general sense, relaxing the assumption that the payoffs assigned to the options are real numbers. The function u:X→R describes which payoff u(x)∈R would result from the agent choosing the option x∈X.

Definition 3 (provisional): Let X be any set of options and R be any set of payoffs. An optimiser is any functional ψ:(X→R)→P(X). A ψ-task is any function u:X→R. An option x∈X is ψ-optimal for a task u:X→R if and only if x∈ψ(u).

This is the final version to this definition today.

This is significantly more expressive! When we are tasked with modelling a game-theoretic situation, we are can pick any set R to model the agent's payoffs!^{[9]}

I'll use the notation JP(X,R) to denote the set of functionals (X→R)→P(X), e.g. argmaxZ∈JP(Z,R).

Anyway, here is a table of some functionals and their game-theoretic interpretation —

Payoff space

Remarks

R=[0,1]

An optimiser ψ∈JP(X,[0,1]) only needs to be well-defined for bounded utility functions u:X→[0,1].

R=N

An optimiser ψ∈JP(X,N) only needs to be well-defined forutility functions u:X→N.

R=R∪{−∞,+∞}

An optimiser ψ∈JP(X,R∪{−∞,+∞}) must be well-defined for infinatary utility functions u:X→R∪{−∞,+∞}.

For example, u(x) might be the expected total winnings of a gambler employing the gambling strategy x∈X. The gambler themselves are modelled by an optimiser ψ:(X→R∪{−∞,+∞})→P(X). This functional ψ is characterised by its attitude towards infinite expected profit/loss.

An optimiser ψ∈JP(X,ΔR) will choose options given a stochastic task u:Γ→ΔR. This is the type-signature of risk-averse and risk-seeking maximisers studied in behavioural microeconomics. The field of Portfolio Theory is (largely) the comparison of rival optimisers in JP(X,ΔR).

The Levi-Civita field contains infinitesimals like ϵ,ϵ2,2ϵ+ϵ2,√ϵ , as well as infinite values like ϵ−1,ϵ−2,ϵ1/3+ϵ−1/3+2. In infinite ethics, we encounter tasks u:X→Levi-Civita Field, and we can model the infinitary ethicist by an optimiser ψ∈JP(X,Levi-Civita Field).

Exercise 4: Solve infinite ethics.

R=Rn

An optimiser ψ∈JP(X,Rn) can model different multi-objective optimisers. For example, there's an optimiser which, given multi-objective task u:X→Rn, returns those options which are maximal according to the lexicographic ordering, and there's another optimiser which uses the product ordering instead.^{[10]}

Later in this post, we'll encounter an optimiser in JP(X1×⋯×Xn,R) which returns the nash equilibria of n-player games, where a task for this optimiser is an n-by-npayoff matrix u:X1×⋯×Xn→R.

The field of cooperative bargaining is concerned with different optimisers JP(X+1,Rn). A bargaining task f:(X+1)→Rn parameterises both the feasibility set F={f(x)∈Rn:x∈X} and the disagreement point d=f(⋆)∈Rn where 1={⋆} is the singleton set.

Any preorder (R,≤)

Suppose that R is any set equipped with a preorder ≤.^{[11]} Then a function u:X→R will induce a preorder ≤u on X via x≤ux′⟺u(x)≤u(x′).

Let argmaxX∈JP(X,R) be the optimiser which chooses the maximal points of ≤u, i.e. options which aren't strictly dominated by any other options.

If ≤ isn't total (i.e. the agent has incomplete preferences over the payoffs) then the resulting optimiser argmaxX is less selective (i.e. more options are optimal). In the extreme case, where no options are comparable, then argmaxX might choose any option.^{[12]}

Exercise 5: Which optimisers ψ∈JP(X,R) defined in the previous table can be generalised to any preorder (R,≤)?

R=X, where X is the option space of the agent.

Occasionally the same set X will serve as both the option space and the payoff space. In this case, a task u:X→X represents some transformation of the underlying option space.

There's an optimiser fix:JP(X,X), which choices options which are fixed-points of u:X→X. That is, fix=λu:X→X.{x∈X|u(x)=x}. We can use fix to model a conservative agent who chooses options which remain untransformed by the task. Note that this optimiser is not consequentialist, because the optimality of an option is not determined by its payoff alone. For example, 0∈fixR(sin) but π∉fixR(sin), despite the fact that sin(0)=sin(π).

Subjective vs objective optimisers

It's standard practice, when modelling agents and their environments, to use payoff spaces like R, Rn, Δ(R), etc, but I think this can be misleading.

Consider the following situation —

A robot is choosing an option from a set X. There's a function f:X→W+ such that, were the robot to choose the option x∈X, then the world would end up in state f(x)∈W+, where W+ is something like the set of all configurations of the future light-cone.

You know the robot is maximising over all their options, but you aren't sure what the robot is maximising for exactly — perhaps for paperclips, perhaps for happy humans.

Now, let p:W+→R be the function which counts the number of paperclips in a light-cone, and let h:W+→R be the function which counts the number of happy humans.

Here's what classical game theory says about your predicament —

The payoff space is R. You know that the robot applies the optimiser argmaxX:(X→R)→P(X), but you don't know whether the robot faces the task (p∘f):X→R or the task (h∘f):X→R, and hence you don't know whether the robot will choose an option x∈argmaxX(p∘f) or x∈argmaxX(h∘f).

I call this a subjective account, because the robot's task depends on the robot's preferences. Were the robot to have difference preferences, then they would've faced a different task, and because you don't know the robot's preferences you don't know their task.

However, by exploiting the expressivity of higher-order game theory, we can offer an objective account which rivals the subjective account. In the objective account, the task that the robot faces doesn't depend on the robots preferences —

The payoff space is W+ itself. You know that the robot faces the task f:X→W+ but you don't know whether the robot applies the optimiser argmaxX(p∘−):(X→W+)→P(X

Written during the SERI MATS program under the joint mentorship ofJohn Wentworth,Nicholas Kees, andJanus.## Preface

In classical game theory, we characterise agents by a utility function and assume that agents choose options which cause maximal utility. This is a pretty good model, but it has some conceptual and empirical limitations which are particularly troublesome for AI safety.

Higher-order game theory (HOGT) is an attempt to rebuild game theory without appealing to either utility functions or maximisation. I think Higher-Order Game Theory is cool so I'm writing a sequence on it.

I'll try to summarise the relevant bits of the literature, present my own minor extensions, and apply HOGT to problems in AI safety

You're reading the first post! Let's get into it.

## The role of argmax

For each set X, let argmaxX:(X→R)→P(X) be the familiar function which receives a function u:X→R and produces the set of element which maximise u. A function like argmaxX is sometimes called a

higher-order functionorfunctional, because it receives another function as input.Explicitly, argmaxX=λu:X→R.{x∈X|∀x′∈X,u(x)≥u(x′)}.

^{[1]}As you all surely know, argmax plays a central role in classical game theory. Typically we interpret the set X as the agent's

options,^{[2]}and the function u:X→R as the agent'stask,which assigns apayoffu(x)∈R to each option x∈X. We say an option x∈X isoptimalto the agent for the task u:X→R whenever x∈argmaxX(u). Classical game theory is governed by the assumption that agents choose optimal options in whatever task they face, where optimality strictly means utility-maximisation.Due to the presence of the powerset operator P in argmaxX:(X→R)→P(X), this model of the agent is

possibilistic— for each task u:X→R, our model says which options arepossiblychosen by agent. The model doesn't say which options areprobablychosen by the agent — for that we'd need a function (X→R)→Δ(X). Nor does the model say which options areactuallychosen by the agent — for that we'd need a function (X→R)→X.^{[3]}Exercise 1:Find a set X such that argmaxX(u)=∅ for every function u:X→R.## Generalising the functional

The function argmaxX is a particular way to turn tasks into sets of options, i.e. it has the type-signature (X→R)→P(X). But there are many functions with the same type-signature (see the table below), so a natural question to ask is... What if we replace argmaxX in classical game theory with an arbitrary functional ψ:(X→R)→P(X)?

What we get is

higher-order game theory.^{[4]}Surprisingly, we can recover many game-theoretic concepts in this more general setting. We can typically recover the originalclassicalconcepts from the more generalhigher-orderconcepts by restricting our attention to either ψ=argmaxX or ψ=argminX.So let's revise our definition —

In higher-order game theory, we model the agents options with a set X and model their task with a function u:X→R. But (unlike in classical game theory) we're free to model the agent's optimisation with any functional ψ:(X→R)→P(X). I hope to persuade you that this additional degree of freedom is actually quite handy.

^{[5]}Higher-order game theory is governed by the central assumption that agents choose ψ-optimal options in whatever ψ-tasks they face, where ψ is our model of the agent's optimisation. If we observe the agent choosing an option x∈ψ(u) then that would be consistent with our model, and any observation of a choice x∉ψ(u) would falsify our model.

^{[6]}Anyway, here is a table of some functionals and their game-theoretic interpretation —

adversaryto the agent modelled by argmax.This agent will choose an option x∈X which dominates some fixed option s∈X. The option s is called the

anchor point. It might represent the "default" option, or the "do nothing" option, or the "human-approved" option.According to Herbert Simon, satisficing is an accurate model of human and institutional decision-making.

Utility-satisficers are a kind of mild optimiser, which might be a safer way to build AI than a full-blown utility-maximiser.

finite precision.This agent will choose an option in a fixed subset S⊆X, regardless of the task, e.g. DefectBot and CooperateBot.

^{[7]}Using rockS, we can model a non-agents as a special (degenerate) case of agents.

This agent will choose an option x∈X in the top ϵ quantile, given that the option space is equipped with a distribution π∈Δ(X).

This is the possibilistic version of Jessica Taylor's quantiliser. Her original probabilistic version is a function (X→R)→ΔX, and so

wouldn'tcount as an optimiser according to Definition 2.^{[8]}less selective,i.e. more options will be optimal.Exercise 2Exercise 3## Generalising the payoff space.

Now let's generalise the payoff space to any set R, not only R. We will think of the elements of R as

payoffsin a general sense, relaxing the assumption that the payoffs assigned to the options are real numbers. The function u:X→R describes which payoff u(x)∈R would result from the agent choosing the option x∈X.This is significantly more expressive! When we are tasked with modelling a game-theoretic situation, we are can pick any set R to model the agent's payoffs!

^{[9]}I'll use the notation JP(X,R) to denote the set of functionals (X→R)→P(X), e.g. argmaxZ∈JP(Z,R).

Anyway, here is a table of some functionals and their game-theoretic interpretation —

boundedutility functions u:X→[0,1].An optimiser ψ∈JP(X,R∪{−∞,+∞}) must be well-defined for

infinataryutility functions u:X→R∪{−∞,+∞}.For example, u(x) might be the expected total winnings of a gambler employing the gambling strategy x∈X. The gambler themselves are modelled by an optimiser ψ:(X→R∪{−∞,+∞})→P(X). This functional ψ is characterised by its attitude towards infinite expected profit/loss.

stochastictask u:Γ→ΔR. This is the type-signature of risk-averse and risk-seeking maximisers studied in behavioural microeconomics. The field of Portfolio Theory is (largely) the comparison of rival optimisers in JP(X,ΔR).The Levi-Civita field contains infinitesimals like ϵ,ϵ2,2ϵ+ϵ2,√ϵ , as well as infinite values like ϵ−1,ϵ−2,ϵ1/3+ϵ−1/3+2. In infinite ethics, we encounter tasks u:X→Levi-Civita Field, and we can model the infinitary ethicist by an optimiser ψ∈JP(X,Levi-Civita Field).

Exercise 4:Solve infinite ethics.An optimiser ψ∈JP(X,Rn) can model different multi-objective optimisers. For example, there's an optimiser which, given multi-objective task u:X→Rn, returns those options which are maximal according to the lexicographic ordering, and there's another optimiser which uses the product ordering instead.

^{[10]}Later in this post, we'll encounter an optimiser in JP(X1×⋯×Xn,R) which returns the nash equilibria of n-player games, where a task for this optimiser is an n-by-npayoff matrix u:X1×⋯×Xn→R.

The field of cooperative bargaining is concerned with different optimisers JP(X+1,Rn). A bargaining task f:(X+1)→Rn parameterises both the feasibility set F={f(x)∈Rn:x∈X} and the disagreement point d=f(⋆)∈Rn where 1={⋆} is the singleton set.

Suppose that R is any set equipped with a preorder ≤.

^{[11]}Then a function u:X→R will induce a preorder ≤u on X via x≤ux′⟺u(x)≤u(x′).Let argmaxX∈JP(X,R) be the optimiser which chooses the maximal points of ≤u, i.e. options which aren't strictly dominated by any other options.

Explicitly λu:X→R.{x∈X:∀x′∈X.u(x)≤u(x′)⟹u(x′)≤u(x)}

If ≤ isn't total (i.e. the agent has incomplete preferences over the payoffs) then the resulting optimiser argmaxX is less selective (i.e. more options are optimal). In the extreme case, where no options are comparable, then argmaxX might choose any option.

^{[12]}Exercise 5:Which optimisers ψ∈JP(X,R) defined in the previous table can be generalised to any preorder (R,≤)?Occasionally the same set X will serve as both the option space and the payoff space. In this case, a task u:X→X represents some transformation of the underlying option space.

There's an optimiser fix:JP(X,X), which choices options which are fixed-points of u:X→X. That is, fix=λu:X→X.{x∈X|u(x)=x}. We can use fix to model a

conservative agentwho chooses options which remain untransformed by the task. Note that this optimiser isnotconsequentialist, because the optimality of an option is not determined by its payoff alone. For example, 0∈fixR(sin) but π∉fixR(sin), despite the fact that sin(0)=sin(π).## Subjective vs objective optimisers

It's standard practice, when modelling agents and their environments, to use payoff spaces like R, Rn, Δ(R), etc, but I think this can be misleading.

Consider the following situation —

Now, let p:W+→R be the function which counts the number of paperclips in a light-cone, and let h:W+→R be the function which counts the number of happy humans.

Here's what classical game theory says about your predicament —

I call this a

subjective account, because the robot's task depends on the robot's preferences. Were the robot to have difference preferences, then they would've faced a different task, and because you don't know the robot's preferences you don't know their task.However, by exploiting the expressivity of higher-order game theory, we can offer an

objective accountwhich rivals the subjective account. In the objective account, the task that the robot faces doesn't depend on the robots preferences —