AI ALIGNMENT FORUM
AF

Game Theory without Argmax [Part 1] — AI Alignment Forum

Written during the SERI MATS program under the joint mentorship of John Wentworth, Nicholas Kees, and Janus.

Create a surreal, black and white banner-like image of an agent deciding how to act in an infinitely large lattice-universe with infinitely many people. The agent should be portrayed as thoughtful and contemplative in the center of a vast, lattice-structured universe. This lattice, symbolizing infinity, should be populated with numerous small figures to represent the countless people. The image should have a surreal quality, with abstract patterns and shapes seamlessly integrated into the lattice. The panoramic, banner-style format should enhance the sense of vastness and complexity, emphasizing the enormity of the decision-making process in such an expansive and structured universe.

Preface

In classical game theory, we characterise agents by a utility function and assume that agents choose options which cause maximal utility. This is a pretty good model, but it has some conceptual and empirical limitations which are particularly troublesome for AI safety.

Higher-order game theory (HOGT) is an attempt to rebuild game theory without appealing to either utility functions or maximisation. I think Higher-Order Game Theory is cool so I'm writing a sequence on it.

[Part 1]
[Part 2]
[Part 3]
...
[Part n]

I'll try to summarise the relevant bits of the literature, present my own minor extensions, and apply HOGT to problems in AI safety

You're reading the first post! Let's get into it.

The role of argmax

For each set , let ${argmax}_{X} : (X \to R) \to P (X)$ be the familiar function which receives a function $u : X \to R$ and produces the set of element which maximise $u$ . A function like ${argmax}_{X}$ is sometimes called a higher-order function or functional, because it receives another function as input.

Explicitly, ${argmax}_{X} = λ^{u : X \to R} . {x \in X | \forall x^{'} \in X, u (x) \geq u (x^{'})}$ .^[1]

As you all surely know, $argmax$ plays a central role in classical game theory. Typically we interpret the set $X$ as the agent's options,^[2] and the function $u : X \to R$ as the agent's task, which assigns a payoff $u (x) \in R$ to each option $x \in X$ . We say an option $x \in X$ is optimal to the agent for the task $u : X \to R$ whenever $x \in {argmax}_{X} (u)$ . Classical game theory is governed by the assumption that agents choose optimal options in whatever task they face, where optimality strictly means utility-maximisation.

Definition 1 (provisional): Let $X$ be any set of options. A task is any function $u : X \to R$ . An option $x \in X$ is optimal for a task $u : X \to R$ if and only if $x \in {argmax}_{X} (u)$ .

Due to the presence of the powerset operator $P$ in ${argmax}_{X} : (X \to R) \to P (X)$ , this model of the agent is possibilistic — for each task $u : X \to R$ , our model says which options are possibly chosen by agent. The model doesn't say which options are probably chosen by the agent — for that we'd need a function $(X \to R) \to Δ (X)$ . Nor does the model say which options are actually chosen by the agent — for that we'd need a function $(X \to R) \to X$ .^[3]

How many options are optimal for the task?	Example
Typically there'll be a unique optimal option.	${argmax}_{C} (λ^{z \in C} (1 - \| z \|^{2})) = {0}$
Perhaps multiple options will be optimal.	${argmax}_{R} (sin) = {2 π k + \frac{π}{2} \| k \in Z}$
Perhaps no options are optimal, i.e. every option is strictly dominated by another.	${argmax}_{R} (exp) = \emptyset$
Perhaps every option is optimal, i.e. the task is a constant function.	${argmax}_{Z} (λ^{k \in Z} . sin (π \cdot k)) = Z$

Exercise 1: Find a set $X$ such that ${argmax}_{X} (u) = \emptyset$ for every function $u : X \to R$ .

Generalising the functional

The function ${argmax}_{X}$ is a particular way to turn tasks into sets of options, i.e. it has the type-signature $(X \to R) \to P (X)$ . But there are many functions with the same type-signature (see the table below), so a natural question to ask is... What if we replace ${argmax}_{X}$ in classical game theory with an arbitrary functional $ψ : (X \to R) \to P (X)$ ?

What we get is higher-order game theory.^[4] Surprisingly, we can recover many game-theoretic concepts in this more general setting. We can typically recover the original classical concepts from the more general higher-order concepts by restricting our attention to either $ψ = {argmax}_{X}$ or $ψ = {argmin}_{X}$ .

So let's revise our definition —

Definition 2 (provisional): Let $X$ be any set of options. An optimiser is any functional $ψ : (X \to R) \to P (X)$ . A $ψ$ -task is any function $u : X \to R$ . An option $x \in X$ is $ψ$ -optimal for a task $u : X \to R$ if and only if $x \in ψ (u)$ .
When clear from context, I'll just say task and optimal.

In higher-order game theory, we model the agents options with a set $X$ and model their task with a function $u : X \to R$ . But (unlike in classical game theory) we're free to model the agent's optimisation with any functional $ψ : (X \to R) \to P (X)$ . I hope to persuade you that this additional degree of freedom is actually quite handy.^[5]

Higher-order game theory is governed by the central assumption that agents choose $ψ$ -optimal options in whatever $ψ$ -tasks they face, where $ψ$ is our model of the agent's optimisation. If we observe the agent choosing an option $x \in ψ (u)$ then that would be consistent with our model, and any observation of a choice $x \notin ψ (u)$ would falsify our model.^[6]

Anyway, here is a table of some functionals and their game-theoretic interpretation —

$ψ : (X \to R) \to P (X)$	Remarks
$argmin =$ $λ^{u : X \to R} . {x \in X \| \forall x^{'} \in X, u (x) \leq u (x^{'})}$	This agent will choose an option $x \in X$ which minimises $u$ . In classical game theory, this type of optimiser is typically used to model the adversary to the agent modelled by $argmax$ .
${satisfice}_{s} =$ $λ^{u : X \to R} . {x \in X \| u (x) \geq u (s)}$	This agent will choose an option $x \in X$ which dominates some fixed option $s \in X$ . The option $s$ is called the anchor point. It might represent the "default" option, or the "do nothing" option, or the "human-approved" option. According to Herbert Simon, satisficing is an accurate model of human and institutional decision-making. Utility-satisficers are a kind of mild optimiser, which might be a safer way to build AI than a full-blown utility-maximiser.
$argmax- ϵ -slack =$ $λ^{u : X \to R} . {x \in X \| \forall x^{'} \in X, u (x) + ϵ \geq u (x^{'})}$	This agent will chooses an option $x \in X$ which maximises the function $u$ up to some fixed slack $ϵ > 0$ . Such agents behave like $argmax$ except that their utility $u (x) \in R$ is measured with finite precision.
$better-than-average =$ $λ^{u : X \to R} . {x \in X \| u (x) \geq E_{x^{'} \sim π} u (x^{'})}$	This agent will choose an option $x \in X$ which scores better than the average option, given that the option space is equipped with a distribution $π \in Δ (X)$ .
${rock}_{S} = λ^{u : X \to R} . S$	This agent will choose an option in a fixed subset $S \subseteq X$ , regardless of the task, e.g. DefectBot and CooperateBot.^[7] Using ${rock}_{S}$ , we can model a non-agents as a special (degenerate) case of agents.
${quant}_{ϵ} =$ $λ^{u : X \to R} . {x \in X : P_{x^{'} \sim π} [u (x^{'}) \geq u (x)] \leq ϵ}$	This agent will choose an option $x \in X$ in the top $ϵ$ quantile, given that the option space is equipped with a distribution $π \in Δ (X)$ . This is the possibilistic version of Jessica Taylor's quantiliser. Her original probabilistic version is a function $(X \to R) \to Δ X$ , and so wouldn't count as an optimiser according to Definition 2.^[8]
${satisfice}_{S} =$ $λ^{u : X \to R} . {x \in X \| \forall s \in S, u (s) \leq u (x)}$	This agent will choose an option $x \in X$ which dominates every anchor point in $S \subseteq X$ . As special cases, when $S = X$ we get ${argmax}_{X}$ , and when $S$ is a singleton set we get Simon's satisficer mentioned before. When the set of anchor points is smaller, then the resulting optimiser is less selective, i.e. more options will be optimal.
${thresh}_{α} = λ^{u : X \to R} . {x \in X \| u (x) \geq α}$	Exercise 2
Exercise 3	This agent will choose an option in the largest equivalence class, where two options are equivalent if they result in the same payoff.

Generalising the payoff space.

Now let's generalise the payoff space to any set $R$ , not only $R$ . We will think of the elements of $R$ as payoffs in a general sense, relaxing the assumption that the payoffs assigned to the options are real numbers. The function $u : X \to R$ describes which payoff $u (x) \in R$ would result from the agent choosing the option $x \in X$ .

Definition 3 (provisional): Let $X$ be any set of options and $R$ be any set of payoffs. An optimiser is any functional $ψ : (X \to R) \to P (X)$ . A $ψ$ -task is any function $u : X \to R$ . An option $x \in X$ is $ψ$ -optimal for a task $u : X \to R$ if and only if $x \in ψ (u)$ .
This is the final version to this definition today.

This is significantly more expressive! When we are tasked with modelling a game-theoretic situation, we are can pick any set $R$ to model the agent's payoffs!^[9]

I'll use the notation $J^{P} (X, R)$ to denote the set of functionals $(X \to R) \to P (X)$ , e.g. ${argmax}_{Z} \in J^{P} (Z, R)$ .

Anyway, here is a table of some functionals and their game-theoretic interpretation —

Payoff space	Remarks
$R = [0, 1]$	An optimiser $ψ \in J^{P} (X, [0, 1])$ only needs to be well-defined for bounded utility functions $u : X \to [0, 1]$ .
$R = N$	An optimiser $ψ \in J^{P} (X, N)$ only needs to be well-defined for utility functions $u : X \to N$ .
$R = R \cup {- \infty, + \infty}$	An optimiser $ψ \in J^{P} (X, R \cup {- \infty, + \infty})$ must be well-defined for infinatary utility functions $u : X \to R \cup {- \infty, + \infty}$ . For example, $u (x)$ might be the expected total winnings of a gambler employing the gambling strategy $x \in X$ . The gambler themselves are modelled by an optimiser $ψ : (X \to R \cup {- \infty, + \infty}) \to P (X)$ . This functional $ψ$ is characterised by its attitude towards infinite expected profit/loss.
$R = Δ (R)$ , where $Δ$ is the distribution monad on $Set$ .	An optimiser $ψ \in J^{P} (X, Δ R)$ will choose options given a stochastic task $u : Γ \to Δ R$ . This is the type-signature of risk-averse and risk-seeking maximisers studied in behavioural microeconomics. The field of Portfolio Theory is (largely) the comparison of rival optimisers in $J^{P} (X, Δ R)$ .
$R$ is the Levi-Civita Field, an extension of the reals with infinitesimals.	The Levi-Civita field contains infinitesimals like $ϵ, ϵ^{2}, 2 ϵ + ϵ^{2}, \sqrt{ϵ}$ , as well as infinite values like $ϵ^{- 1}, ϵ^{- 2}, ϵ^{1 / 3} + ϵ^{- 1 / 3} + 2$ . In infinite ethics, we encounter tasks $u : X \to Levi-Civita Field$ , and we can model the infinitary ethicist by an optimiser $ψ \in J^{P} (X, Levi-Civita Field)$ . Exercise 4: Solve infinite ethics.
$R = R^{n}$	An optimiser $ψ \in J^{P} (X, R^{n})$ can model different multi-objective optimisers. For example, there's an optimiser which, given multi-objective task $u : X \to R^{n}$ , returns those options which are maximal according to the lexicographic ordering, and there's another optimiser which uses the product ordering instead.^[10] Later in this post, we'll encounter an optimiser in $J^{P} (X_{1} \times \dots \times X_{n}, R)$ which returns the nash equilibria of $n$ -player games, where a task for this optimiser is an $n$ -by- $n$ payoff matrix $u : X_{1} \times \dots \times X_{n} \to R$ . The field of cooperative bargaining is concerned with different optimisers $J^{P} (X + 1, R^{n})$ . A bargaining task $f : (X + 1) \to R^{n}$ parameterises both the feasibility set $F = {f (x) \in R^{n} : x \in X}$ and the disagreement point $d = f (⋆) \in R^{n}$ where $1 = {⋆}$ is the singleton set.
Any preorder $(R, \leq)$	Suppose that $R$ is any set equipped with a preorder $\leq$ .^[11] Then a function $u : X \to R$ will induce a preorder $\leq_{u}$ on $X$ via $x \leq_{u} x^{'} ⟺ u (x) \leq u (x^{'})$ . Let ${argmax}_{X} \in J^{P} (X, R)$ be the optimiser which chooses the maximal points of $\leq_{u}$ , i.e. options which aren't strictly dominated by any other options. Explicitly $λ^{u : X \to R} . {x \in X : \forall x^{'} \in X . u (x) \leq u (x^{'}) ⟹ u (x^{'}) \leq u (x)}$ If $\leq$ isn't total (i.e. the agent has incomplete preferences over the payoffs) then the resulting optimiser ${argmax}_{X}$ is less selective (i.e. more options are optimal). In the extreme case, where no options are comparable, then ${argmax}_{X}$ might choose any option.^[12] Exercise 5: Which optimisers $ψ \in J^{P} (X, R)$ defined in the previous table can be generalised to any preorder $(R, \leq)$ ?
$R = X$ , where $X$ is the option space of the agent.	Occasionally the same set $X$ will serve as both the option space and the payoff space. In this case, a task $u : X \to X$ represents some transformation of the underlying option space. There's an optimiser $fix : J^{P} (X, X)$ , which choices options which are fixed-points of $u : X \to X$ . That is, $fix = λ^{u : X \to X} . {x \in X \| u (x) = x}$ . We can use $fix$ to model a conservative agent who chooses options which remain untransformed by the task. Note that this optimiser is not consequentialist, because the optimality of an option is not determined by its payoff alone. For example, $0 \in {fix}_{R} (sin)$ but $π \notin {fix}_{R} (sin)$ , despite the fact that $sin (0) = sin (π)$ .

Subjective vs objective optimisers

It's standard practice, when modelling agents and their environments, to use payoff spaces like $R$ , $R^{n}$ , $Δ (R)$ , etc, but I think this can be misleading.

Consider the following situation —

A robot is choosing an option from a set $X$ . There's a function $f : X \to W^{+}$ such that, were the robot to choose the option $x \in X$ , then the world would end up in state $f (x) \in W^{+}$ , where $W^{+}$ is something like the set of all configurations of the future light-cone.
You know the robot is maximising over all their options, but you aren't sure what the robot is maximising for exactly — perhaps for paperclips, perhaps for happy humans.

Now, let $p : W^{+} \to R$ be the function which counts the number of paperclips in a light-cone, and let $h : W^{+} \to R$ be the function which counts the number of happy humans.

Here's what classical game theory says about your predicament —

The payoff space is $R$ . You know that the robot applies the optimiser ${argmax}_{X} : (X \to R) \to P (X)$ , but you don't know whether the robot faces the task $(p \circ f) : X \to R$ or the task $(h \circ f) : X \to R$ , and hence you don't know whether the robot will choose an option $x \in {argmax}_{X} (p \circ f)$ or $x \in {argmax}_{X} (h \circ f)$ .

I call this a subjective account, because the robot's task depends on the robot's preferences. Were the robot to have difference preferences, then they would've faced a different task, and because you don't know the robot's preferences you don't know their task.

However, by exploiting the expressivity of higher-order game theory, we can offer an objective account which rivals the subjective account. In the objective account, the task that the robot faces doesn't depend on the robots preferences —

The payoff space is $W^{+}$ itself. You know that the robot faces the task $f : X \to W^{+}$ but you don't know whether the robot applies the optimiser ${argmax}_{X} (p \circ -) : (X \to W^{+}) \to P (X)$ or the optimiser ${argmax}_{X} (h \circ -) : (X \to W^{+}) \to P (X)$ , and hence you don't know whether the robot will choose an option $x \in {argmax}_{X} (p \circ f)$ or $x \in {argmax}_{X} (h \circ f)$ .

Notice that both accounts yield the same solution! Nonetheless, I think the objective account is nicer for four reasons. (Feel free to skip if you're convinced.)

Disclaimer: Admittedly, the distinction between subjective accounts — where payoff spaces are stuff like $R$ , $R^{n}$ , $R \cup {- \infty, \infty}$ , $Z$ , $Δ ([0, 1]^{2})$ , e.t.c. — and objective accounts — where payoff spaces are stuff like future light-cones, or brain states, or pixel configurations, e.t.c — is an informal (and somewhat metaphysical) distinction, but hopefully you can see what I'm pointing at.

(1) Carve nature at its joints.

The objective account, where $R = W^{+}$ , bares a closer structural resemblance to the physical reality. The physical robot corresponds to the functional $ψ \in J^{P} (X, W^{+})$ and the physical environment corresponds to the function $u : X \to W^{+}$ . Notably, all the information about the robot's idiosyncratic preferences is bundled up inside the functional $ψ$ .

In contrast, in the subjective account, where $R = R$ , the functional $ψ \in J^{P} (X, R)$ contains almost no substantial information about the agent itself. It suggests (if read too literally) that all agents are basically indiscernible, and they behave differently because they face different environments.

(2) Moral antirealism.

The subjective account (again, read too literally) suggests that values are out there in the world, that the environment contains entities called utilities which all rational agents seek, that all conflict is disagreement, that correctness is a property of pebble heaps, that microeconomics is normative, and (most concerning of all) that the primary obstacle to building a safe superintelligence is writing down a utility function.

The objective account, I think, is more moral antirealist. It says, "The world contains only paperclips and happy humans, never utilities! The world contains only paperclip-maximisers and happy-human-maximisers, never utility-maximisers!"

(3) Experimental independence

In the objective account, the task $f : X \to W^{+}$ and the optimiser $ψ : (X \to W^{+}) \to P (X)$ have independent semantic meaning. At least in principle, I know how to find $f : X \to W^{+}$ independently of $ψ$ — namely by inspecting the physical dynamics of the robot's environment or inspecting the robot's world-model. And I know how to find $ψ : (X \to W^{+}) \to P (X)$ independently of $f$ — namely by placing the robot in different physical environments and observing their choices.

By contrast, in the subjective account, the task $f : X \to R$ and the optimiser $ψ : (X \to R) \to P (X)$ have no independent meaning — they are merely exist to compress the optimality condition $ψ (f) \subseteq X$ . What would it even mean for the robot to possess the utility function $u : X \to R$ without the presumption that they maximise utility? I've honestly no clue. And without the task $u : X \to R$ , how would I determine the robot's optimiser $ψ : (X \to R) \to P (X)$ experimentally? Presumably I should vary the task $u : X \to R$ , however I can't do this experimentally because $u : X \to R$ contains the robot's preferences which is a variable outside my control.

Granted, for most historic applications of classical game theory, we do know the preferences of the agent — we already know that White wants to checkmate Black, and the consumer wants cheaper goods, and the statistician wants to accurate predictions, e.t.c — so it doesn't matter whether one sticks those preferences in the task or the optimiser. But in AI safety, a big chunk of our perplexity comes from the preferences of the agents. So it matters more that we stick those preferences in the right part of our model.

(4) No spooky reals.

The subjective account seems to rely on the elements of a mysterious set called " $R$ " which is extraneous to the phenomenon under consideration. By contrast, the objective account refers only to the sets $X$ and $W^{+}$ , where the elements of $X$ and $W^{+}$ are physical stuff intrinsic to the situation being modelled. Hence, higher-order game theory promises to dispense with $R$ from game theory, along with argmax and utility functions, ensuring the weirdness of $R$ doesn't contaminate our game theory.^[13]

This has a computational upshot as well.

Supposes that $X = {x_{1}, \dots, x_{n}}$ and $W^{+} = {w_{1}, \dots, w_{m}}$ are small finite sets. A task $f : X \to W^{+}$ can be implemented as dictionary whose keys lie in $X$ and whose values lie in $W^{+}$ , which uses $n log m$ bits. The functional $ψ : J^{P} (X, W^{+})$ can be implemented as a program which receives input of type $D i c t [X, W]$ and returns output of type $L i s t [X]$ . Easy!

In the subjective account, by contrast, the task $f : X \to R$ requires infinite bits to specify, and the functional $ψ : J^{P} (X, R)$ must somehow accept a representation of an arbitrary function $f : X \to R$ . Oh no! This is especially troubling for embedded agency, where the agent's decision theory must run on a physical substrate.

Recovering utility functions

According to the objective account, what is fundamental about an agent is the functional $ψ : (X \to W^{+}) \to P (X)$ where $W^{+}$ is some objective payoff, and the claim that the agent has a utility function $v : W^{+} \to R$ is understood as the claim that $ψ$ can be approximately decomposed into ${argmax}_{X} (v \circ -)$ . Hence, the existence of a utility-decomposition of $ψ$ is an additional fact about the agent to be discovered, rather than an assumption that should be baked into the formalism itself.

Utility functions are an emergent property of the underlying functional.

One clue that utility functions are emergent properties is that they aren't unique! It's well-known that a utility function $v : W^{+} \to R$ for an agent is only well-defined modulo positive-affine transformation, i.e. there is no meaningful distinction between $v : W^{+} \to R$ and $v^{'} : W^{+} \to R$ whenever $v^{'} = α \cdot v + β$ for some $α \in R^{+}, β \in R$ . This fact falls immediately from the objective-first view, because ${argmax}_{X} (v \circ -)$ and ${argmax}_{X} (v^{'} \circ -)$ are equal functionals whenever $v = α \cdot v^{'} + β$

Now, if we were dealing with $argmax- ϵ -slack$ or ${thresh}_{α}$ — instead of $argmax$ — then there would be a meaningful difference between some utility functions which are equivalent modulo positive-affine transformation.

Let's make this notion precise —

Definition 4: Let $ψ \in J^{P} (X, W)$ be an optimiser. We say that $v : W \to R$ is a (classical) utility function of $ψ$ if and only if $ψ = {argmax}_{X} (v \circ -)$ . In general, for any $Φ \in J^{P} (X, R)$ , we say that $v : W \to R$ is a $Φ$ -utility function of $ψ$ if and only if $ψ = Φ (v \circ -)$ .

Typically, $ψ$ is some objective optimiser and $Φ$ is some subjective optimiser. When $Φ = {argmax}_{X}$ then we obtain the classical utility functions of an objective optimiser $ψ$ , and we may obtain non-classical utility functions of the same optimiser $ψ$ by considering (e.g.) $Φ = {satisfice}_{s} \in J^{P} (X, R)$ or $ψ = {better-than-average}_{π} \in J^{P} (X, R)$ or whatever.

Classical game theory is the study of optimisers with classical utility functions. There are some theoretical and empirical arguments for restricting only to such optimisers but these arguments are probably overrated. In any case, I suspect that unifying deep learning and classical game theory will require studying non-classical agents. Here's why — in the deep learning paradigm, we build agents by training a large neural network with stochastic gradient descent on tasks which fortify agentic-like behaviour. At initialisation, these neural networks aren't classical agents, and classicality emerges incrementally, probably after passing through phases of nonclassical agency. Therefore, if we want to account for the emergence of agency (classical or otherwise), then we need to account for the loss gradient over the entire space of optimisers $J^{P} (X, W^{+})$ , not merely over the subspace of $J^{P} (X, W^{+})$ corresponding to classical optimisers.

Some properties of optimisers

We can define formalise various properties and operations of optimisation using arbitrary functional $ψ \in J^{P} (X, R)$ .

I've included the list of examples below for illustrative purposes only —

Property	Remarks
Totality	We'll say that an optimiser $ψ \in J^{P} (X, R)$ is total iff $ψ (u) \neq \emptyset$ for all $u : X \to R$ . Informally, this condition states that our model of the agent never "breaks down", i.e. regardless of the task the agent faces, there's always some optimal choice.
Selectivity	We'll say that an optimiser $ψ_{1}$ is (weakly) more selective than $ψ_{2}$ if and only if $\forall u : X \to R . ψ_{1} (u) \subseteq ψ_{2} (u)$ . This relation defines a partial order on $J^{P} (X, R)$ . For example, ${satisfice}_{A}$ is more selective than ${satisfice}_{B}$ whenever $B ⊊ A \subseteq X$ . Mild optimization, an approach for mitigating Goodhart's law, replaces $argmax$ with less selective optimisers.
Consequentialism	We'll say that an optimiser $ψ \in J^{P} (X, R)$ is consequentialist if $ψ = λ^{u : X \to R} . {x \in X \| u (x) \in q (u)}$ for some $q : (X \to R) \to P (R)$ .^[14] In other words, for any task $u : X \to R$ , if $u (x) = u (x^{'})$ then $x \in ψ (u) ⟺ x^{'} \in ψ (u)$ . This condition says that, once we know the agent's task, then the only thing relevant to the optimality of a particular choice is its payoff. For example, ${argmax}_{X}$ and ${better-than-average}_{π}$ are consequentialist, but ${rock}_{S}$ and $fix$ are not. This function $q$ says, for each task $u : X \to R$ , which payoffs would be acceptable to the agent, so $q = max$ is the quantifier for $ψ = {argmax}_{X}$ .
Context-independence	We'll say that a consequentialist optimiser $ψ = λ^{u : X \to R} . {x \in X \| u (x) \in q (u)}$ is context-independent if $Image (u) = Image (u^{'}) ⟹ q (u) = q (u^{'})$ . Context-independence is a stronger condition than consequentialism — this condition says that, once we know which payoffs are achievable in the agent's task, then the only thing relevant to the optimality of a particular option is its payoff. For example, ${argmax}_{X}$ is context-independent, but ${better-than-average}_{π}$ is not.
Filtered optimisation	Suppose that $ψ \in J^{P} (X, R)$ is an optimiser, and $X_{legal} \subseteq X$ is a subset of the options which are safe/valid/legal. Then we can define another optimiser $ψ^{'} = λ^{u : X \to R} . X_{legal} \cap ψ (u) \in J^{P} (X, R)$ who always chooses options in $X_{legal}$ . This operation captures the notion of filtering options after the agent has applied the optimisation. For example, if $X = R$ and $ψ = {argmax}_{R}$ then $ψ (cos) = {2 π k : k \in Z}$ , so if $X_{legal} = [0, 4 π]$ then $ψ^{'} (cos) = {0, 2 π, 4 π}$ .
Constrained optimisation	The filtering operation doesn't capture what we typically mean by optimisation within side-constraints. For example, if we change $X_{legal}$ to $[π / 4, π / 2]$ , then we would like the optimiser to choose $π / 4$ as this maximises $cos$ subject to the constraint $X_{legal}$ . The filtered optimisation would produce the emptyset, as none of the options optimal to ${argmax}_{R}$ are legal. Let's define constrained optimisation for an optimiser $ψ \in J^{P} (X, R)$ . Suppose that there is an element $⊥ \in R$ such that $⊥ \in u (ψ (u)) ⟹ Im (u) \subseteq {⊥}$ . Informally, this means that $ψ$ detests the payoff $⊥ \in R$ and would never chooses an option resulting in $⊥$ unless it must. For example, ${argmax}_{X} \in J^{P} (X, R \cup {- \infty, + \infty})$ detests the payoff $- \infty$ . We can defined constrained optimisation as follows: $ψ^{'} = λ^{u : X \to R} . ψ (u^{'})$ where $u^{'} : X \to R, x \mapsto {\begin{matrix} u (x) & if x \in X_{legal} ⊥ & otherwise \end{matrix}$ Exercise 6: How might we define constrained optimisation if there is no such $⊥ \in R$ ?
Supervision	Maybe the set of legal options $X_{legal} \in P X$ depends on the task $u : X \to R$ itself. This dependency itself corresponds to an optimiser $ψ_{legal} \in J^{P} (X, R)$ . We can define the filtered optimiser as $ψ^{'} : λ^{X \to R} . ψ_{legal} (u) \cap ψ (u)$ . We can define the constrained optimiser $ψ^{'} = λ^{u : X \to R} . ψ (u^{'})$ where $u^{'} : X \to R, x \mapsto {\begin{matrix} u (x) & if x \in ψ_{legal} (u) ⊥ & otherwise \end{matrix}$ The optimiser $ψ^{'}$ behaves like one agent $ψ$ being supervised by another agent $ψ_{safe}$ where the mode of supervision is filtering and constraining respectively.
Unanimity	For a collection of optimisers $Z \subseteq J^{P} (X, R)$ , we can define a single optimiser $λ^{u : X \to R} . ⋂ {ψ (u) : ψ \in Z}$ who will only choose an option if each constituent optimiser would also choose the option, forming a unanimous coalition of $Z$ . Dually, we can define the optimiser $λ^{u : X \to R} . ⋃ {ψ (u) : ψ \in Z}$ forming a unilateral coalition.
Shards	Suppose that $f : (X \to R) \to J^{P} (X, R)$ assigns an optimiser $f (u) : J^{P} (X, R)$ to each task $u : X \to R$ . In terms of $f$ , we can define the optimiser $ψ = λ^{u : X \to R} . f (u) (u) : J^{P} (X, R)$ by diagonalising — i.e. $ψ$ will match the optimiser $f (u)$ in each task $u : X \to R$ . This operation captures (somewhat) the notion of shards, or context-activated optimisation. For example, imagine an agent $ψ : J^{P} (X, R)$ who "plays it safe" by satisficing unless they can achieve sufficiently high payoff, i.e. $f (u) = {\begin{matrix} {argmax}_{X} & if max (u) \geq 100 {satisfice}_{s} & otherwise \end{matrix}$ Or imagine an agent who desires cheese whenever they are sufficiently close to cheese, but otherwise desires to run around aimlessly.

Recap

In classical game theory, agents maximise their utility functions $u : X \to R$ , i.e. they might choose any option $x \in {argmax}_{X} (u)$ .
In higher-order game theory, we replace the utility functions $u : X \to R$ with an arbitrary function $u : X \to R$ , called a "task", and replace ${argmax}_{X} : (X \to R) \to P (X)$ with an arbitrary functional $ψ : (X \to R) \to P (X)$ called the "optimiser". They might choose any option $x \in ψ (u)$ .
This additional expressivity lets us include mild optimisers and multi-objective optimisers which don't crudely maximise utility.
It also lets us include objective optimisers, which strive for particular physical configurations, which dispenses with the concept of utility altogether.
We can recover utility functions as an emergent property of an objective optimiser, relative to any choice of subjective optimiser, not only to $argmax$ .
Finally, we can define some interesting properties and operations on the optimisers $J^{P} (X, R)$ which correspond (loosely) to things that AI safety researchers care about.

Next time...

The next post will answer the age-old question, "What happens when two optimisers $ψ_{A} \in J^{P} (A, R)$ and $ψ_{B} \in J^{P} (B, R)$ play the simultaneous game $g : A \times B \to R$ ?" We know what happens when $ϕ_{A}$ and $ψ_{B}$ are both utility-maximisers — the possible option-profiles are pairs $(a, b) \in A \times B$ in nash equilibrium.

Can we really generalise the nash equilibrium to any pair of optimisers?

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

18

Game Theory without Argmax [Part 1]

18

Preface

The role of argmax

Generalising the functional

Generalising the payoff space.

Subjective vs objective optimisers

(1) Carve nature at its joints.

(2) Moral antirealism.

(3) Experimental independence

(4) No spooky reals.

Recovering utility functions

Some properties of optimisers

Recap

Next time...

Further reading