Some Existing Selection Theorems

AI ALIGNMENT FORUM
AF

Some Existing Selection Theorems — AI Alignment Forum

This post illustrates how various existing Selection Theorems and related results fit the general framework - what they tell us about agent type signatures, and how they (usually implicitly) tell us what agent types will be selected. I invite others to leave comments about any other Selection Theorems you know of - for instance, Radical Probabilism and Logical Inductors are results/frameworks which can be viewed as Selection Theorems but which I haven’t included below.

This post assumes you have read the intro post on the Selection Theorem program. The intended audience is people who might work on the Selection Theorem program, so these blurbs are intended to be link-heavy hooks and idea generators rather than self-contained explanations.

The Gooder Regulator Theorem

The Gooder Regulator Theorem talks about the optimal design of a “regulator” (i.e. agent) in an environment like this:

When viewed as a Selection Theorem, the outer optimization process selects for high values of and low-information models $M$ (i.e. models which don’t take up much space). Assuming that $Z$ is a “sufficiently flexible” function of $Y$ , the theorem says that the optimal “model” $M$ is isomorphic to the Bayesian posterior distribution $(s \mapsto P [S = s | X])$ . In other words, the system’s internal structure includes an explicit Bayesian world model.

Representation: world model represented by probability distribution
Interfaces: “inputs” $X$ induce Bayesian updates of the world model. The distribution is over “system states” $S$ .
Embedding: Agent subsystems are assumed to follow the causal structure in the diagram; the theorem then justifies labelling one of them the “model”.

Coherence Theorems

This cluster of theorems is the most common foundation for agent models today. It includes things like Dutch Book Theorems, Complete Class Theorem, Savage’s Theorem, Fundamental Theorem of Asset Pricing, variations of these, and probably others as well. These theorems provide many paths to the same agent type signature: Bayesian expected utility maximization.

Representation: utility function and probability distribution.
Interfaces: both the utility function and distribution take in “bet outcomes”, assumed to be specified as part of the environment. The outputs of the agent are “actions” which maximize expected utility under the distribution; the inputs are “observations” which update the distribution via Bayes’ Rule.
Embedding: “agent” must interact with “environment” only via the specified “bets”. Bayesian expected utility maximization relates to low-level agent implementation via behavioral equivalence.

Besides the obvious type-signature assumption (the “bets”), these theorems also typically have some more subtle assumptions built in - like the need for a money-like resource or the absence of internal agent state or something to do with self-prediction. They apply most easily to financial markets; other applications usually require some careful thought about what to identify as “bets” so that the “bets” work the way they need to in order for the theorems to apply.

Typically, these theorems say that a strategy which does not satisfy the type signature is strictly dominated by some other strategy. Assuming a rich enough strategy space and a selection process which can find the dominating strategies, we therefore expect selection to produce a strategy which does satisfy the type signature (at least approximately). If the assumptions of the theorem can actually be fit to the selection process, that is.

Kelly Criterion

The Kelly criterion uses a similar setup to the Coherence Theorems, with the added assumption that agents make sequential, independent bets and can bet up to their total wealth each time (a model originally intended for traders in financial markets or betting markets). Under these conditions, agents which maximize their expected log wealth at each timestep achieve the highest long-run growth rate with probability 1.

The type signature implied by the Kelly criterion is similar to the previous section, except the utility is specifically log wealth.

Representation: probability distribution.
Interfaces: the distribution takes in “bet outcomes” (which are wealth changes), assumed to be specified as part of the environment. The outputs of the agent are “actions” which maximize expected log wealth under the distribution; the inputs are “observations” which update the distribution via Bayes’ Rule.
Embedding: “agent” must interact with “environment” only via the specified “bets”. Bayesian expected log wealth maximization relates to low-level agent implementation via behavioral equivalence.

As a selection theorem, the Kelly criterion is especially interesting because it’s specifically about selection. It does not give any fundamental philosophical reason why one “should” want to maximize expected log wealth; it just says that agents which do maximize log wealth will be selected for. So, in environments where the Kelly assumptions apply, those are the agents we should expect to see.

Subagents

Fun fact: financial markets themselves make exactly the kind of “bets” required by the Coherence Theorems, and are the ur-example of a system not dominated by some other strategy. So, from the Coherence Theorems, we expect financial markets to be equivalent to Bayesian expected utility maximizers, right? Well, it turns out they’re not - a phenomenon economists call “nonexistence of a representative agent”. (Though, interestingly, a market of Kelly criterion agents is equivalent to a Bayesian expected utility maximizer.)

When we dive into the details, the main issue is that markets have internal state which can’t be bet on. If we update Coherence to account for that, then it looks like markets/committees of expected utility maximizers are the appropriate type signature for non-dominated strategies (rather than single utility maximizers). In other words, this type signature has subagents.

Again, the type signature is mostly similar to the Coherence Theorems, but tweaked a bit.

Representation: multiple utility functions and probability distributions.
Interfaces: both the utility functions and distributions take in “bet outcomes”, assumed to be specified as part of the environment. The outputs of the agent are “actions” which pareto-maximize expected utilities under the distributions; the inputs are “observations” which update the distributions via Bayes’ Rule.
Embedding: “agent” must interact with “environment” only via the specified “bets”. Bayesian expected utility pareto-maximization relates to low-level agent implementation via behavioral equivalence.

(Note: this type signature is only conjectured in the linked post; the post proves only the non-probabilistic version.)

Instrumental Convergence and Power-Seeking

Turner’s theorems on instrumental convergence say that optimal strategies for achieving most goals involve similar actions - i.e. “power-seeking” actions - given some plausible assumptions on the structure of the environment. These theorems are not Selection Theorems in themselves, but they offer a possible path to construct a money-like “utility measuring stick” for selected agents in systems with no explicit “money” - which would allow us to more broadly apply variants of the Coherence Theorems.

Description Length Minimization = Utility Maximization

The equivalence between Description Length Minimization and Utility Maximization tells us two things:

The intuitive notion of optimization as “steering (some part of) the world into a smaller chunk of state space” is basically equivalent to utility maximization.
Utility maximization can generally be interchanged with description length minimization (i.e. information compression) under some fixed encoding.

This result is interesting mainly because it offers a way to apply information-theoretic tools directly to goals, but we can also view it as a (very weak) Selection Theorem in its own right.

Representation: expected utility or description length
Interfaces: arbitrary
Embedding: none/arbitrary - the theorem does not explicitly involve an agent, but can apply to any “goal-directed” agent if one is present

This result can also be viewed as a way to characterize the selection process (i.e. outer optimizer), rather than the selected agent.

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

30

Some Existing Selection Theorems

30

The Gooder Regulator Theorem

Coherence Theorems

Kelly Criterion

Subagents

Instrumental Convergence and Power-Seeking

Description Length Minimization = Utility Maximization