# 17

Written during the SERI MATS program under the joint mentorship of John Wentworth, Nicholas Kees, and Janus.

# Preface

In classical game theory, we characterise agents by a utility function and assume that agents choose options which cause maximal utility. This is a pretty good model, but it has some conceptual and empirical limitations which are particularly troublesome for AI safety.

Higher-order game theory (HOGT) is an attempt to rebuild game theory without appealing to either utility functions or maximisation. I think Higher-Order Game Theory is cool so I'm writing a sequence on it.

I'll try to summarise the relevant bits of the literature, present my own minor extensions, and apply HOGT to problems in AI safety

You're reading the first post! Let's get into it.

# The role of argmax

For each set , let  be the familiar function which receives a function  and produces the set of element which maximise . A function like  is sometimes called a higher-order function or functional, because it receives another function as input.

Explicitly, .[1]

As you all surely know,  plays a central role in classical game theory. Typically we interpret the set  as the agent's options,[2] and the function  as the agent's task, which assigns a payoff  to each option . We say an option  is optimal to the agent for the task  whenever . Classical game theory is governed by the assumption that agents choose optimal options in whatever task they face, where optimality strictly means utility-maximisation.

Definition 1 (provisional): Let  be any set of options. A task is any function . An option  is optimal for a task  if and only if .

Due to the presence of the powerset operator  in , this model of the agent is possibilistic — for each task , our model says which options are possibly chosen by agent. The model doesn't say which options are probably chosen by the agent — for that we'd need a function . Nor does the model say which options are actually chosen by the agent — for that we'd need a function .[3]

Exercise 1: Find a set  such that  for every function .

# Generalising the functional

The function  is a particular way to turn tasks into sets of options, i.e. it has the type-signature . But there are many functions with the same type-signature (see the table below), so a natural question to ask is... What if we replace  in classical game theory with an arbitrary functional ?

What we get is higher-order game theory.[4] Surprisingly, we can recover many game-theoretic concepts in this more general setting. We can typically recover the original classical concepts from the more general higher-order concepts by restricting our attention to either  or .

So let's revise our definition —

Definition 2 (provisional): Let  be any set of options. An optimiser is any functional . A -task is any function . An option  is -optimal for a task  if and only if .

When clear from context, I'll just say task and optimal.

In higher-order game theory, we model the agents options with a set  and model their task with a function . But (unlike in classical game theory) we're free to model the agent's optimisation with any functional . I hope to persuade you that this additional degree of freedom is actually quite handy.[5]

Higher-order game theory is governed by the central assumption that agents choose -optimal options in whatever -tasks they face, where  is our model of the agent's optimisation. If we observe the agent choosing an option  then that would be consistent with our model, and any observation of a choice  would falsify our model.[6]

Anyway, here is a table of some functionals and their game-theoretic interpretation —

# Generalising the payoff space.

Now let's generalise the payoff space to any set , not only . We will think of the elements of  as payoffs in a general sense, relaxing the assumption that the payoffs assigned to the options are real numbers. The function  describes which payoff  would result from the agent choosing the option .

Definition 3 (provisional): Let  be any set of options and  be any set of payoffs. An optimiser is any functional . A -task is any function . An option  is -optimal for a task  if and only if .

This is the final version to this definition today.

This is significantly more expressive! When we are tasked with modelling a game-theoretic situation, we are can pick any set  to model the agent's payoffs![9]

I'll use the notation  to denote the set of functionals , e.g. .

Anyway, here is a table of some functionals and their game-theoretic interpretation —

# Subjective vs objective optimisers

It's standard practice, when modelling agents and their environments, to use payoff spaces like , etc, but I think this can be misleading.

Consider the following situation —

A robot is choosing an option from a set . There's a function  such that, were the robot to choose the option , then the world would end up in state , where  is something like the set of all configurations of the future light-cone.

You know the robot is maximising over all their options, but you aren't sure what the robot is maximising for exactly — perhaps for paperclips, perhaps for happy humans.

Now, let  be the function which counts the number of paperclips in a light-cone, and let  be the function which counts the number of happy humans.

The payoff space is . You know that the robot applies the optimiser , but you don't know whether the robot faces the task  or the task , and hence you don't know whether the robot will choose an option  or .

I call this a subjective account, because the robot's task depends on the robot's preferences. Were the robot to have difference preferences, then they would've faced a different task, and because you don't know the robot's preferences you don't know their task.

However, by exploiting the expressivity of higher-order game theory, we can offer an objective account which rivals the subjective account. In the objective account, the task that the robot faces doesn't depend on the robots preferences —

The payoff space is  itself. You know that the robot faces the task  but you don't know whether the robot applies the optimiser  or the optimiser , and hence you don't know whether the robot will choose an option  or .

Notice that both accounts yield the same solution! Nonetheless, I think the objective account is nicer for four reasons. (Feel free to skip if you're convinced.)

Disclaimer: Admittedly, the distinction between subjective accounts — where payoff spaces are stuff like , , , , , e.t.c. — and objective accounts — where payoff spaces are stuff like future light-cones, or brain states, or pixel configurations, e.t.c — is an informal (and somewhat metaphysical) distinction, but hopefully you can see what I'm pointing at.

## (1) Carve nature at its joints.

The objective account, where , bares a closer structural resemblance to the physical reality. The physical robot corresponds to the functional  and the physical environment corresponds to the function . Notably, all the information about the robot's idiosyncratic preferences is bundled up inside the functional .

In contrast, in the subjective account, where , the functional  contains almost no substantial information about the agent itself. It suggests (if read too literally) that all agents are basically indiscernible, and they behave differently because they face different environments.

## (2) Moral antirealism.

The subjective account (again, read too literally) suggests that values are out there in the world, that the environment contains entities called utilities which all rational agents seek, that all conflict is disagreement, that correctness is a property of pebble heaps, that microeconomics is normative, and (most concerning of all) that the primary obstacle to building a safe superintelligence is writing down a utility function.

The objective account, I think, is more moral antirealist. It says, "The world contains only paperclips and happy humans, never utilities! The world contains only paperclip-maximisers and happy-human-maximisers, never utility-maximisers!"

## (3) Experimental independence

In the objective account, the task  and the optimiser  have independent semantic meaning. At least in principle, I know how to find  independently of  — namely by inspecting the physical dynamics of the robot's environment or inspecting the robot's world-model. And I know how to find  independently of  — namely by placing the robot in different physical environments and observing their choices.

By contrast, in the subjective account, the task  and the optimiser  have no independent meaning — they are merely exist to compress the optimality condition . What would it even mean for the robot to possess the utility function  without the presumption that they maximise utility? I've honestly no clue. And without the task , how would I determine the robot's optimiser  experimentally? Presumably I should vary the task , however I can't do this experimentally because  contains the robot's preferences which is a variable outside my control.

Granted, for most historic applications of classical game theory, we do know the preferences of the agent — we already know that White wants to checkmate Black, and the consumer wants cheaper goods, and the statistician wants to accurate predictions, e.t.c — so it doesn't matter whether one sticks those preferences in the task or the optimiser. But in AI safety, a big chunk of our perplexity comes from the preferences of the agents. So it matters more that we stick those preferences in the right part of our model.

## (4) No spooky reals.

The subjective account seems to rely on the elements of a mysterious set called "" which is extraneous to the phenomenon under consideration. By contrast, the objective account refers only to the sets  and , where the elements of  and  are physical stuff intrinsic to the situation being modelled. Hence, higher-order game theory promises to dispense with  from game theory, along with argmax and utility functions, ensuring the weirdness of  doesn't contaminate our game theory.[13]

This has a computational upshot as well.

Supposes that  and  are small finite sets. A task  can be implemented as dictionary whose keys lie in  and whose values lie in , which uses  bits. The functional  can be implemented as a program which receives input of type  and returns output of type . Easy!

In the subjective account, by contrast, the task  requires infinite bits to specify, and the functional  must somehow accept a representation of an arbitrary function . Oh no! This is especially troubling for embedded agency, where the agent's decision theory must run on a physical substrate.

# Recovering utility functions

According to the objective account, what is fundamental about an agent is the functional  where  is some objective payoff, and the claim that the agent has a utility function  is understood as the claim that  can be approximately decomposed into . Hence, the existence of a utility-decomposition of  is an additional fact about the agent to be discovered, rather than an assumption that should be baked into the formalism itself.

Utility functions are an emergent property of the underlying functional.

One clue that utility functions are emergent properties is that they aren't unique! It's well-known that a utility function  for an agent is only well-defined modulo positive-affine transformation, i.e. there is no meaningful distinction between  and  whenever  for some . This fact falls immediately from the objective-first view, because  and  are equal functionals whenever

Now, if we were dealing with  or  — instead of  — then there would be a meaningful difference between some utility functions which are equivalent modulo positive-affine transformation.

Let's make this notion precise —

Definition 4: Let  be an optimiser. We say that  is a (classical) utility function of  if and only if . In general, for any , we say that  is a -utility function of  if and only if .

Typically,  is some objective optimiser and  is some subjective optimiser. When  then we obtain the classical utility functions of an objective optimiser , and we may obtain non-classical utility functions of the same optimiser  by considering (e.g.)  or  or whatever.

Classical game theory is the study of optimisers with classical utility functions. There are some theoretical and empirical arguments for restricting only to such optimisers but these arguments are probably overrated. In any case, I suspect that unifying deep learning and classical game theory will require studying non-classical agents. Here's why — in the deep learning paradigm, we build agents by training a large neural network with stochastic gradient descent on tasks which fortify agentic-like behaviour. At initialisation, these neural networks aren't classical agents, and classicality emerges incrementally, probably after passing through phases of nonclassical agency. Therefore, if we want to account for the emergence of agency (classical or otherwise), then we need to account for the loss gradient over the entire space of optimisers , not merely over the subspace of  corresponding to classical optimisers.

# Some properties of optimisers

We can define formalise various properties and operations of optimisation using arbitrary functional .

I've included the list of examples below for illustrative purposes only —

# Recap

• In classical game theory, agents maximise their utility functions , i.e. they might choose any option .
• In higher-order game theory, we replace the utility functions  with an arbitrary function , called a "task", and replace  with an arbitrary functional  called the "optimiser". They might choose any option .
• This additional expressivity lets us include mild optimisers and multi-objective optimisers which don't crudely maximise utility.
• It also lets us include objective optimisers, which strive for particular physical configurations, which dispenses with the concept of utility altogether.
• We can recover utility functions as an emergent property of an objective optimiser, relative to any choice of subjective optimiser, not only to .
• Finally, we can define some interesting properties and operations on the optimisers  which correspond (loosely) to things that AI safety researchers care about.

# Next time...

The next post will answer the age-old question, "What happens when two optimisers  and  play the simultaneous game ?" We know what happens when  and  are both utility-maximisers — the possible option-profiles are pairs  in nash equilibrium.

Can we really generalise the nash equilibrium to any pair of optimisers?

1. ^

The little  is from (simply-typed) lambda notion for introducing functions. If  is the familiar sine function, then  is the function  which sends each  to , so . We can also use lambda notation to define higher-order functions, e.g. if  then .

2. ^

Synonyms: actions, alternatives, behaviours, configurations, coordinates, hypotheses, moves, options, outputs, parameters, plays, policies, positions, strategies — or whatever else is being chosen due to the payoffs assigned to it.

3. ^

Scott Garrabrant has called   the the type-signature of agency, and you can interpret this sequence as an expansion on his remark.

In response to Garrabrant, MrMind raises the objection that no map  can be parametric in both  and  because there is no map . Fortunately, this objection doesn't apply here, because we're looking for functions , and there is a function .

4. ^

The term higher-order game theory is coined in [Hedges 2015], presumably because functions of the form  are often called higher-order functions. Throughout the article, I will only use the term in this sense.

Warning: The term higher-order game theory is also used in [Nisan 2021] to refer to "the study of agents thinking about agents thinking about agents..." — and [Diffractor 2021] calls this higher-level game theory. I will never use the term in this sense.

5. ^

Spoilers: As you'll see later, this is the definition of optimisation you need to ensure that the nash equilibrium between two optimisers is also an optimiser.

6. ^

Of course, if , then this model of the agent and their task would be falsified by any observed option, and can be eliminated a priori.

We can also interpret  as the statement that the computer program implementing the agent's decision theory would throw an exception, or fail to halt, when given the input

7. ^
8. ^

In tomorrow's post,  will be generalised to an arbitrary commutative monad . When we let  be the probability monad , then we get optimisers of type  which would include Taylor's original probabilistic quantiliser. But don't worry about that now!

9. ^

... and any set  to model the agent's options, any function  to model the agent's task, and any functional  to model the agent's optimisation.

10. ^

When  is equipped with the product ordering, the optimiser  is the called the pareto optimiser which maps each multi-objective task  to the pareto-frontier. This is the least selective optimiser which cares about each objective .

11. ^

A partial order  is a relation on  satisfying three rules:

(1) Reflectivity: .

(2) Transitivity: If  and  then .

(3) Anti-symmetry: If  and  then .

A preorder  is a relation satisfying only reflectivity and transitivity. A preorder is the non-evil version of a partial order, because it's agnostic about ""identity"" between equivalent objects.

Now, we say an agent's preferences are incomplete when neither  nor , and an agent's preferences are indifferent when both  and  but not . So a partial order represents preferences which might be incomplete but never indifferent, while a preorder represents preferences which might be both incomplete and indifferent. Agents with incomplete preferences have recently been proposed as a solution to the Shutdown Problem.

12. ^

Warning: There're another optimiser in  who chooses the maximum points of , i.e. options which weakly dominate all other options. Explicitly, . This optimiser is qualitatively different, being more selective when its preferences are incomplete.

13. ^

In Science without Numbers (1980), Hartry Field attempted to reformulate Newtonian physics without quantification over mathematical objects. Instead, the formulae of his theory would quantify only over physical objects and regions of spacetime. He called this normalisation of Newtonian physics.

Likewise, higher-order game theory promises a normalisation of game theory.

I don't know why, but this smells correct to me.

14. This property is called closedness in [Hedges 2015] because the set  is closed under the equivalence relation  for every .

A function  is called a quantifier in the HOGT literature.

New Comment
1 comment, sorted by Click to highlight new comments since:

I think this part uses an unfair comparison:

Supposes that  and  are small finite sets. A task  can be implemented as dictionary whose keys lie in  and whose values lie in , which uses  bits. The functional  can be implemented as a program which receives input of type  and returns output of type . Easy!

In the subjective account, by contrast, the task  requires infinite bits to specify, and the functional  must somehow accept a representation of an arbitrary function . Oh no! This is especially troubling for embedded agency, where the agent's decision theory must run on a physical substrate.

If X and W+ are small finite sets, then any behavior can be described with a utility function requiring only a finite number of bits to specify. You only need to use R as the domain when W+ is infinite, such as when outcomes are continuous, in which case the dictionaries require infinite bits to specify too.

I think this is representative of an unease I have with the framing of this sequence. It seems to be saying that the more general formulation allows for agents that behave in ways that utility maximizers cannot, but most of these behaviors exist for maximizers of certain utility functions. I'm still waiting for the punchline of what AI safety relevant aspect requires higher order game theory rather than just maximizing agents, particularly if you allow for informational constraints.