This first post presents a distillation of the concept, and subsequent posts explore its implications.

Two Approaches to Optimisation

Beren introduces a taxonomy categorising intelligent systems according to the kind of optimisation they are performing. I think it's more helpful to think of these as two ends of a spectrum as opposed to distinct discrete categories; sophisticated real world intelligent systems (e.g. humans) appear to be a hybrid of the two approaches.

Direct Optimisers

Systems that perform inference by directly choosing actions^{[1]} to optimise some objective function

Responses are computed on the fly and individually for each input

Direct optimisers perform inference by answering the question: "what action maximises or minimises this objective function ([discounted] cumulative reward and loss respectively)?"

Examples: AIXI, MCTS, model-based reinforcement learning, other "planning" systems

Naively, direct optimisers can be understood as computing (an approximation of) argmax (or argmin) for a suitable objective function during inference.

Amortised Optimisers

Systems that learn to approximate a function^{[2]} during training and perform inference by evaluating the output of the learned function on their inputs.

The function approximator is learned from a dataset of input data and successful solutions

Amortised optimisation converts an inference problem to a supervised learning problem

It's called "amortised optimisation" because while learning the policy is expensive, the cost of inference is amortised over all evaluations of the learned policy

Amortised optimisers can be seen as performing inference by answering the question "what output (e.g. action, probability distribution over tokens) does this learned function (policy, predictive model) return for this input (agent state, prompt)?"

Examples: model free reinforcement learning, LLMs, most supervised & self supervised(?) learning systems

Naively, amortised optimisers can be understood as evaluating a (fixed) learned function; they're not directly computing argmax (or argmin) for any particular objective function during inference.

Differences

Aspect

Direct Optimization

Amortized Optimization

Problem Solving

Computes optimal responses "on the fly"

Evaluates the learned function approximator on the given input

Computational Approach

Searches through a solution space

Learns a function approximator

Runtime Cost

Higher, as it requires in-depth search for a suitable solution

Lower, as it only needs a forward pass through the function approximator

Scalability with Compute

Scales by expanding search depth

Scales by better approximating the posterior distribution

Convergence

In the limit of arbitrary compute, the system's policy converges to argmax||argmin of the appropriate objective function

In the limit of arbitrary compute, the system's policy converges to the best description of the training dataset

Performance

More favourable in "simple" domains

More favourable in "rich" domains

Data Efficiency

Little data needed for high performance (e.g. an MCTS agent can attain strongly superhuman performance in Chess/Go given only the rules and sufficient compute)

Requires (much) more data for high performance (e.g. an amortised agent necessarily needs to observe millions of chess games to learn skilled play)

Generalization

Dependent on search depth and compute

Dependent on the learned function approximator/training dataset

Alignment Focus

Emphasis on safe reward function design

Emphasis on reward function and dataset design

Out-of-Distribution Behavior

Can diverge arbitrarily from previous behavior

Constrained by the learned function approximator

Examples

AIXI, MCTS, model-based RL

Supervised learning, model-free RL, GPT models

Some Commentary

Direct optimisation is feasible in "simple" (narrow problem domains, deterministic, discrete, fully observable/perfect information, etc.) environments (e.g. tic-tac-toe, chess, go) but unwieldy in "rich" (complex/high dimensional problem domains, continuous, stochastic, large state/action spaces, partially observable/imperfect information, etc.) environments (e.g. the real world).

The limitations of direct optimisation in rich environments seem complexity theoretic, so better algorithms won't fix them

In practice some systems use a hybrid of the two approaches with most cognition performed in an amortised manner but planning deployed when necessary (e.g. system 2 vs system 1 in humans)

Hybrid systems can be "bootstrapped" in both directions

A planner can be initialised with amortised policies, or an amortised value model could be used to prune subtrees of a planner's search that are unlikely to be fruitful

This approach is used in Alpha Go and similar systems

Likewise, direct optimisation can be used to improve the data we are training the function approximator on

^{^}

Or strategies, plans, probabilities, categories, etc.; any "output" of the system.

^{^}

Beren:

I would add that this function is usually the solution to the objective solved by some form of direct optimiser. I.e. your classifier learns the map from input -> label.

## Preamble

I heavily recommend @beren's "Deconfusing Direct vs Amortised Optimisation". It's a very important conceptual clarification that has changed how I think about many issues bearing on technical AI safety.

Currently, it's the most important blog post I've read this year.

This sequence (if I get around to completing it) is an attempt to draw more attention to Beren's conceptual frame and its implications for how to think about issues of alignment and agency.

This first post presents a distillation of the concept, and subsequent posts explore its implications.

## Two Approaches to Optimisation

Beren introduces a taxonomy categorising intelligent systems according to the kind of optimisation they are performing. I think it's more helpful to think of these as two ends of a spectrum as opposed to distinct discrete categories; sophisticated real world intelligent systems (e.g. humans) appear to be a hybrid of the two approaches.

## Direct Optimisers

^{[1]}to optimise some objective functionNaively, direct optimisers can be understood as computing (an approximation of) argmax (or argmin) for a suitable objective function during inference.

## Amortised Optimisers

^{[2]}during training and perform inference by evaluating the output of the learned function on their inputs.Naively, amortised optimisers can be understood as evaluating a (fixed) learned function; they're not directly computing argmax (or argmin) for any particular objective function during inference.

## Differences

AspectDirect OptimizationAmortized Optimization## Some Commentary

^{^}Or strategies, plans, probabilities, categories, etc.; any "output" of the system.

^{^}Beren: