Gradient descent doesn't select for inner search

5Ramana Kumar

6Ivan Vendrov

New Comment

Mostly orthogonal:

- Evan's post argues that
*if*search is computationally optimal (in the sense of being the minimal circuit) for a task,*then*we can construct a task where the minimal circuit that solves it is deceptive. - This post argues against (a version of) Evan's premise: search is not in fact computationally optimal in the context of modern tasks and architectures, so we shouldn't expect gradient descent to select for it.

Other relevant differences are

- gradient descent doesn't actually select for low time complexity / minimal circuits; it holds time & space complexity fixed, while selecting for low L2 norm. But I think you could probably do a similar reduction for L2 norm as Evan does for minimal circuits. The crux is in the premise.
- I think Evan is using a broader definition of search than I am in this post, closer to John Wentworth's definition of search as "general problem solving algorithm".
- Evan is doing worst-case analysis (can we completely rule out the possibility of deception by penalizing time complexity?) whereas I'm focusing on the average or default case.

TL;DR:Gradient descent won't select for inner search processes because they're not compute & memory efficient.Slightly longer TL;DR: A key argument for mesa-optimization is that as we search over programs, we will select for "search processes with simple objectives", because they are simpler or more compact than alternative less dangerous programs. This argument is much weaker when your program search is restricted to programs that use a fixed amount of compute, and you're not optimizing strongly for low description length - e.g. gradient descent in modern deep learning systems. We don't really know what shape of programs gradient descent selects for in realistic environments, but they are much less likely to involve search than commonly believed.Note on terminology (added in response to comments): By "search" I mean here a process that evaluates a number of candidates before returning the best one; what Abram Demski calls "selection" inSelection vs Control. The more candidates considered, the more "search-like" a process is - with gradient descent and A* being central examples, and a thermostat being a central counter-example.## Recap: compression argument for inner optimizers

Here's the argument from Risks From Learned Optimization: [emphasis mine]

and even more forceful phrasing from John Wentworth:

## Compactness, Complexity, and Compute

At face value, it does seem like we're selecting programs for simplicity. The Deep Double Descent paper showed us that gradient descent training in the overparametrized regime (i.e. the regime of all modern deep models) favors simpler models. But is this notion of simplicity the same as "compactness" or "complexity"? Evan seems to think so, I'm less sure. Let's dive into the different notions of complexity here.

The most commonly used notion of program complexity is Kolmogorov complexity (or

description length), basically just "length of the program in some reference programming language". This definition seems natural... but, critically,it assumes away all computational constraints. K-complexity doesn't care if your program completes in a millisecond or runs until the heat death of the universe. This makes it a particularly perverse notion of complexity to use when interacting with the real, computationally constrained world.Why do we use Kolmogorov complexity at all, then? One reason it's so intuitive is that it's the primary notion of complexity that humans programmers use in their work. Programs that humans write are selected to be short, hence easy to write and analyze. Memory and compute use are often secondary concerns, because in most programming contexts human time is so much more valuable than computer time.

A simplistic model for how humans do program search is

"What's the shortest program I can implement that solves the problem exactly?". The answer is usually some form of explicit search with a simple objective, because with no computational constraints that's always the shortest description length program (as an extreme example, bogosort is probably the lowest-description-length sorting procedure).In contrast, the way modern deep learning does program search is

"I can tweak the parameters of a massively parallel computation involving at most 50 serial steps, 100 million multiply-adds per step, and 100 kilobytes of memory to store intermediate results; what parameters get me the best approximate solution given this compute budget?". Here it seems much less clear that the resulting program is going to resemble search; it might well be a diffuse bundle of heuristics, or something else that humans don't have good intuitions for.What's going on here? Once we stop assuming compute is infinite and instead fix a budget, you can no longer rely on the "compression trick" that lets you replace an actual policy with a search for the policy.

Search isn't free; it costs you compute and memory that could have been spent running a better policy, and gradient descent doesn't select for low Kolmogorov complexity nearly enough (if at all) to compensate.Some caveats to this conclusion:

Thanks to Richard Ngofor helping me clarify this point.)Added August 16:John Wentworth responds with two more arguments for why we'll select for inner search, the existence of general-purpose heuristics and the recursive structure of search.## (Speculative) Implications for interpretability and safety

If most of the danger of inner optimization comes from search and planning

^{[1]}, this line of reasoning has some unexpected implications. In AI safety we generally think of large black-box models as dangerous, and small interpretable models as safer. But the bias towards explicit search with simple objectives turns out to result from ahumanbias towardsmechanistically interpretable(i.e. low-Kolmogorov-complexity) programs, a bias that often comes at significant computational cost.If what we really care about when we talk about interpretability is

predicting generalization errors,mechanistic interpretability may not be what we want. There is hideous, unpredictable complexity lurking in mechanistically simple programs such as the 3n+1 algorithm and the Rule 110 cellular automaton. The black-box models that gradient descent converges on may turn out to be simpler and more predictable in the sense that really matters. An "agent AI" trained end-to-end with gradient descent may be safer in practice than a "tool AI" that generates C code given a problem description, because the latter shares human programmers' bias towards explicit search over simple, brittle objectives. Which is where much of the danger lies.^{^}Unclear how true this is - in the limit of capabilities, both search and control processes could be lethal. But intuitively, for a fixed level of capabilities, it seems like search processes are more brittle and likely to mis-generalize catastrophically than control processes.