This is something which has bothered me for a while, but, I'm writing it specifically in response to the recent post on mesa-optimizers.
I feel strongly that the notion of 'optimization process' or 'optimizer' which people use -- partly derived from Eliezer's notion in the sequences -- should be split into two clusters. I call these two clusters 'selection' vs 'control'. I don't have precise formal statements of the distinction I'm pointing at; I'll give several examples.
Before going into it, several reasons why this sort of thing may be important:
- It could help refine the discussion of mesa-optimization. The article restricted its discussion to the type of optimization I'll call 'selection', explicitly ruling out 'control'. This choice isn't obviously right. (More on this later.)
- Refining 'agency-like' concepts like this seems important for embedded agency -- what we eventually want is a story about how agents can be in the world. I think almost any discussion of the relationship between agency and optimization which isn't aware of the distinction I'm drawing here (at least as a hypothesis) will be confused.
- Generally, I feel like I see people making mistakes by not distinguishing between the two (whether or not they've derived their notion of optimizer from Eliezer). I judge an algorithm differently if it is intended as one or the other.
(See also Stuart Armstrong's summary of other problems with the notion of optimization power Eliezer proposed -- those are unrelated to my discussion here, and strike me more as technical issues which call for refined formulae, rather than conceptual problems which call for revised ontology.)
The Basic Idea
Eliezer quantified optimization power by asking how small a target an optimization process hits, out of a space of possibilities. The type of 'space of possibilities' is what I want to poke at here.
First, consider a typical optimization algorithm, such as simulated annealing. The algorithm constructs an element of the search space (such as a specific combination of weights for a neural network), gets feedback on how good that element is, and then tries again. Over many iterations of this process, it finds better and better elements. Eventually, it outputs a single choice.
This is the prototypical 'selection process' -- it can directly instantiate any element of the search space (although typically we consider cases where the process doesn't have time to instantiate all of them), it gets direct feedback on the quality of each element (although evaluation may be costly, so that the selection process must economize these evaluations), the quality of an element of search space does not depend on the previous choices, and only the final output matters.
The term 'selection process' refers to the fact that this type of optimization selects between a number of explicitly given possibilities. The most basic example of this phenomenon is a 'filter' which rejects some elements and accepts others -- like selection bias in statistics. This has a limited ability to optimize, however, because it allows only one iteration. Natural selection is an example of much more powerful optimization occurring through iteration of selection effects.
Now, consider a targeting system on a rocket -- let's say, a heat-seeking missile. The missile has sensors and actuators. It gets feedback from its sensors, and must somehow use this information to decide how to use its actuators. This is my prototypical control process. (The term 'control process' is supposed to invoke control theory.) Unlike a selection process, a controller can only instantiate one element of the space of possibilities. It gets to traverse exactly one path. The 'small target' which it hits is therefore 'small' with respect to a space of counterfactual possibilities, with all the technical problems of evaluating counterfactuals. We only get full feedback on one outcome (although we usually consider cases where the partial feedback we get along the way gives us a lot of information about how to navigate toward better outcomes). Every decision we make along the way matters, both in terms of influencing total utility, and in terms of influencing what possibilities we have access to in subsequent decisions.
So: in evaluating the optimization power of a selection process, we have a fairly objective situation on our hands: the space of possibilities is explicitly given; the utility function is explicitly given; we can compare the true output of the system to a randomly chosen element. In evaluating the optimization power of a control process, we have a very subjective situation on our hands: the controller only truly takes one path, so any judgement about a space of possibilities requires us to define counterfactuals; it is less clear how to define an un-optimized baseline; utility need not be explicitly represented in the controller, so may have to be inferred (or we think of it as parameter, so, we can measure optimization power with respect to different utility functions, but there's no 'correct' one to measure).
I do think both of these concepts are meaningful. I don't want to restrict 'optimization' to refer to only one or the other, as the mesa-optimization essay does. However, I think the two concepts are of a very different type.
Bottlecaps & Thermostats
The mesa-optimizer write-up made the decision to focus on what I call selection processes, excluding control processes:
We will say that a system is an optimizer if it is internally searching through a search space (consisting of possible outputs, policies, plans, strategies, or similar) looking for those elements that score high according to some objective function that is explicitly represented within the system. [...] For example, a bottle cap causes water to be held inside the bottle, but it is not optimizing for that outcome since it is not running any sort of optimization algorithm.(1) Rather, bottle caps have been optimized to keep water in place.
It makes sense to say that we aren't worried about bottlecaps when we think about the inner alignment problem. However, this also excludes much more powerful 'optimizers' -- something more like a plant.
When does a powerful control process become an 'agent'?
- Bottlecaps: No meaningful actuators or sensors. Essentially inanimate. Does a particular job, possibly very well, but in a very predictable manner.
- Thermostats: Implements a negative feedback loop via a sensor, an actuator, and a policy of "correcting" things when sense-data indicates they are "off". Actual thermostats explicitly represent the target temperature, but one can imagine things in this cluster which wouldn't -- in general, the connection between what is sensed and how things are 'corrected' can be quite complex (involving many different sensors and actuators), so that no one place in the system explicitly represents the 'target'.
- Plants: Plants are like very complex thermostats. They have no apparent 'target' explicitly represented, but can clearly be thought of as relatively agentic, achieving complicated goals in complicated environments.
- Guided Missiles: These are also mostly in the 'thermostat' category, but, guided missiles can use simple world-models (to track the location of the target). However, any 'planning' is likely based on explicit formulae rather than any search. (I'm not sure about actual guided missiles.) If so, a guided missile would still not be a selection process, and therefore lack a "goal" in the mesa-optimizer sense, despite having a world-model and explicitly reasoning about how to achieve an objective represented within that world-model.
- Chess Programs: A chess-playing program has to play each game well, and every move is significant to this goal. So, it is a control process. However, AI chess algorithms are based on explicit search. Many, many moves are considered, and each move is evaluated independently. This is a common pattern. The best way we know how to implement very powerful controllers is to use search inside (implementing a control process using a selection process). At that point, a controller seems clearly 'agent-like', and falls within the definition of optimizer used in the meso-optimization post. However, it seems to me that things become 'agent-like' somewhere before this stage.
(See also: adaptation-executers, not fitness maximizers.)
I don't want to frame it as if there's "one true distinction" which we should be making, which I'm claiming the mesa-optimization write-up got wrong. Rather, we should pay attention to the different distinctions we might make, studying the phenomena separately and considering the alignment/safety implications of each.
This is closely related to the discussion of upstream daemons vs downstream daemons. A downstream-daemon seems more likely to be an optimizer in the sense of the mesa-optimization write-up; it is explicitly planning, which may involve search. These are more likely to raise concerns through explicitly reasoned out treacherous turns. An upstream-daemon could use explicit planning, but it could also be only a bottlecap/thermostat/plant. It might powerfully optimize for something in the controller sense without internally using selection. This might produce severe misalignment, but not through explicitly planned treacherous turns. (Caveat: we don't understand mesa-optimizers; an understanding sufficient to make statements such as these with confidence would be a significant step forward.)
It seems possible that one could invent a measure of "control power" which would rate highly-optimized-but-inanimate objects like bottlecaps very low, while giving a high score to thermostat-like objects which set up complicated negative feedback loops (even if they didn't use any search).
Processes Within Processes
I already mentioned the idea that the best way we know how to implement powerful control processes is through powerful selection (search) inside of the controller.
To elaborate a bit on that: a controller with a search inside would typically have some kind of model of the environment, which it uses by searching for good actions/plans/policies for achieving its goals. So, measuring the optimization power as a controller, we look at how successful it is at achieving its goals in the real environment. Measuring the optimization power as a selector, we look at how good it is at choosing high-value options within its world-model. The search can only do as well as its model can tell it; however, in some sense, the agent is ultimately judged by the true consequences of its actions.
IE, in this case, the selection vs control distinction is a map/territory distinction. I think this is part of why I get so annoyed at things which mix up selection and control: it looks like a map/territory error to me.
However, this is not the only way selection and control commonly relate to each other.
Effective controllers are very often designed through a search process. This might be search taking place within a model, again (for example, training a neural network to control a robot, but getting its gradients from a physics simulation so that you can generate a large number of training samples relatively cheaply) or the real world (evolution by natural selection, "evaluating" genetic code by seeing what survives).
Further complicating things, a powerful search algorithm generally has some "smarts" to it, ie, it is good at choosing what option to evaluate next based on the current state of things. This "smarts" is controller-style smarts: every choice matters (because every evaluation costs processing power), there's no back-tracking, and you have to hit a narrow target in one shot. (Whatever the target of the underlying search problem, the target of the search-controller is: find that target, quickly.) And, of course, it is possible that such a search-controller will even use a model of the fitness landscape, and plan its next choice via its own search!
(I'm not making this up as a weird hypothetical; actual algorithms such as estimation-of-distribution algorithms will make models of the fitness landscape. For obvious reasons, searching for good points in such models is usually avoided; however, in cases where evaluation of points is expensive enough, it may be worth it to explicitly plan out test-points which will reveal the most information about the fitness landscape, so that the best point can be selected later.)
Blurring the Lines: What's the Critical Distinction?
I mentioned earlier that this dichotomy seems more like a conceptual cluster than a fully formal distinction. I mentioned a number of big differences which stick out at me. Let's consider some of these in more detail.
The classical sort of search algorithm I described as my central example of a selection process includes the ability to get a perfect evaluation of any option. The difficulty arises only from the very large number of options available. Control processes, on the other hand, appear to have very bad feedback, since you can't know the full outcome until it is too late to do anything about it. Can we use this as our definition?
I would agree that a search process in which the cost of evaluation goes to infinity becomes purely a control process: you can't perform any filtering of possibilities based on evaluation, so, you have to output one possibility and try to make it a good one (with no guarantees). Maybe you get some information about the objective function (like its source code), and you have to try to use that to choose an option. That's your sensors and actuators. They have to be very clever to achieve very good outcomes. The cheaper it is to evaluate the objective function on examples, the less "control" you need (the more you can just do brute-force search). In the opposite extreme, evaluating options is so cheap that you can check all of them, and output the maximum directly.
While this is somewhat appealing, it doesn't capture every case. Search algorithms today (such as stochastic gradient descent) often have imperfect feedback. Game-tree search deals with an objective function which is much too costly to evaluate directly (the quality of a move), but can be optimized for nonetheless by recursively searching for good moves in subgames down the game tree (mixed with approximate evaluations such as rollouts or heuristic board evaluations). I still think of both of these as solidly on the "selection process" side of things.
On the control process side, it is possible to have perfect feedback without doing any search. Thermostats realistically have noisy information about the temperature of a room, but, you can imagine a case where they get perfect information. It isn't any less a controller, or more a selection process, for that fact.
Choices Don't Change Later Choices
Another feature I mentioned was that in selection processes, all options are available to try at any time, and what you look at now does not change how good any option will be later. On the other hand, in a control process, previous choices can totally change how good particular later choices would be (as in reinforcement learning), or change what options are even available (as in game playing).
First, let me set two complications aside.
- Weird decision theory cases: it is theoretically possible to screw with a search by giving it an objective function which depends on its choices during search. This doesn't seem that interesting for our purposes here. (And that's coming from me...)
- Local search limits the "options" to small modifications of the option just considered. I don't think this is blurring the lines between search and control; rather, it is more like using a controller within a smart search to try to increase efficiency, as I discussed at the end of the processes-within-processes section. All the options are still "available" at all times; the search algorithm just happens to be one which limits itself to considering a smaller list.
I do think some cases blur the lines here, though. My primary example is the multi-armed bandit problem. This is a special case of the RL problem in which the history doesn't matter; every option is equally good every time, except for some random noise. Yet, to me, it is still a control problem. Why? Because every decision matters. The feedback you get about how good a particular choice was isn't just thought of as information; you "actually get" the good/bad outcome each time. That's the essential character of the multi-armed bandit problem: you have to trade off between experimentally trying options you're uncertain about vs sticking with the options which seem best so far, because every selection carries weight.
This leads me to the next proposed definition.
Offline vs Online
Selection processes are like offline algorithms, whereas control processes are like online algorithms.
With offline algorithms, you only really care about the end results. You are OK running gradient descent for millions of iterations before it starts doing anything cool, so long as it eventually does something cool.
With online algorithms, you care about each outcome individually. You would probably not want to be gradient-descent-training a neural network in live user-servicing code on a website, because live code has to be acceptably good from the start. Even if you can initialize the neural network to something acceptably good, you'd hesitate to run stochastic gradient descent on it live, because stochastic gradient descent can sometimes dramatically decrease performance for a while before improving performance again.
Furthermore, online algorithms have to deal with non-stationarity. This seems suitably like a control issue.
So, selection processes are "offline optimization", whereas control processes are "online optimization": optimizing things "as they progress" rather than statically. (Note that the notion of "online optimization" implied by this line of thinking is slightly different from the common definition of online optimization, though related.)
The offline vs online distinction also has a lot to do with the sorts of mistakes I think people are making when they confuse selection processes and control processes. Reinforcement learning, as a subfield of AI, was obviously motivated from a highly online perspective. However, it is very often used as an offline algorithm today, to produce effective agents, rather than as an effective agent. So, that there's been some mismatch between the motivations which shaped the paradigm and actual use. This perspective made it less surprising when black-box optimization beat reinforcement learning on some problems (see also).
This seems like the best definition so far. However, I personally still feel like it is still missing something important. Selection vs control feels to me like a type distinction, closer to map-vs-territory.
To give an explicit counterexample: evolution by natural selection is obviously a selection process according to the distinction as I make it, but it seems much more like an online algorithm than on offline one, if we try to judge it as such.
Internal Features vs Context
Returning to the definition in mesa-optimizers (emphasis mine):
Whether a system is an optimizer is a property of its internal structure—what algorithm it is physically implementing—and not a property of its input-output behavior. Importantly, the fact that a system’s behavior results in some objective being maximized does not make the system an optimizer.
The notion of a selection process says a lot about what is actually happening inside a selection process: there is a space of options, which can be enumerated; it is trying them; there is some kind of evaluation; etc.
The notion of control process, on the other hand, is more externally defined. It doesn't matter what's going on inside of the controller. All that matters is how effective it is at what it does.
A selection process -- such as a neural network learning algorithm -- can be regarded "from outside", asking questions about how the one output of the algorithm does in the true environment. In fact, this kind of thinking is what we do when we think about generalization error.
Similarly, we can analyze a control process "from inside", trying to find the pieces which correspond to beliefs, goals, plans, and so on (or postulate what they would look like if they existed -- as must be done in the case of controllers which truly lack such moving parts). This is the decision-theoretic view.
However, one might argue that viewing selection processes from the outside is viewing them as control -- viewing them as essentially having one shot at overall decision quality. Similarly, viewing control process from inside is essentially viewing it as selection -- the decision-theoretic view gives us a version of a control problem which we can solve by mathematical optimization.
In this view, selection vs control doesn't really cluster different types of object, but rather, different types of analysis. To a large extent, we can cluster objects by what kind of analysis we would more often want to do. However, certain cases (such as a game-playing AI) are best viewed through both lenses (as a controller, in the context of doing well in a real game against a human, and as a selection process, when thinking about the game-tree search).
Overall, I think I'm probably still somewhat confused about the whole selection vs control issue, particularly as it pertains to the question of how decision theory can apply to things in the world.
Selection vs Control is a distinction I always point to when discussing optimization. Yet this is not the two takes on optimization I generally use. My favored ones are internal optimization (which is basically search/selection), and external optimization (optimizing systems from Alex Flint’s The ground of optimization). So I do without control, or at least without Abram’s exact definition of control.
Why? Simply because the internal structure vs behavior distinction mentioned in this post seems more important than the actual definitions (which seem constrained by going back to Yudkowski’s optimization power). The big distinction is between doing internal search (like in optimization algorithms or mesa-optimizers) and acting as optimizing something. It is intuitive that you can do the second without the first, but before Alex Flint’s definition, I couldn’t put words on my intuition than the first implies the second.
So my current picture of optimization is Internal Optimization (Internal Search/Selection) \subset External Optimization (Optimizing systems). This means that I think of this post as one of the first instances of grappling at this distinction, without agreeing completely with the way it ends up making that distinction.
I like this division a lot. One nitpick: I don't think internal optimization is a subset of external optimization, unless we're redrawing the system boundary at some point. A search always takes place within the context of a system's (possibly implicit) world-model; that's the main thing which distinguishes it from control/external optimization. If that world-model does not match the territory, then the system may not successfully optimize anything in its environment, even though it's searching for optimizing plans internally.
My take on internal optimization as a subset of external optimization probably works assuming convergence, because the configuration space capturing the internal state of the program (and its variables) is pushed reliably towards the configurations with a local minimum in the corresponding variable. See here.
Whether that's actually what we want is another question, but I think the point you're mentioning can be captured by whether the target subspace of the configuration space puts constraints on things outside the system (for good cartesian boundaries and all the corresponding subtleties).
Got it, that's the case I was thinking of as "redrawing the system boundary". Makes sense.
That still leaves the problem that we can write an (internal) optimizer which isn't iterative. For instance, a convex function optimizer which differentiates its input function and then algebraically solves for zero gradient. (In the real world, this is similar to what markets do.) This was also my main complaint on Flint's notion of "optimization": not all optimizers are iterative, and sometimes they don't even have an "initial" point against which we could compare.
I'm a bit confused: why can't I just take the initial state of the program (or of the physical system representing the computer) as the initial point in configuration space for your example? The execution of your program is still a trajectory through the configuration space of your computer.
Personally, my biggest issue with optimizing systems is that I don't know what the "smaller" concerning the target space really means. If the target space has only one state less than the total configuration space, is this still an optimizing system? Should we compute a ratio of measure between target and total configuration space to have some sort of optimizing power?
The initial state of the program/physical computer may not overlap with the target space at all. The target space wouldn't be larger or smaller (in the sense of subsets); it would just be an entirely different set of states.
Flint's notion of optimization, as I understand it, requires that we can view the target space as a subset of the initial space.
(I am unfortunately currently bogged down with external academic pressures, and so cannot engage with this at the depth I’d like to, but here’s some initial thoughts.)
I endorse this post. The distinction explained here seems interesting and fruitful.
I agree with the idea to treat selection and control as two kinds of analysis, rather than as two kinds of object – I think this loosely maps onto the distinction we make between the mesa-objective and the behavioural objective. The former takes the selection view of the learned algorithm; the latter takes the control view.
At least speaking for myself (the other authors might have different thoughts on this), the decision to talk explicitly in terms of the selection view in the mesa-optimiser post is based on an intuition that selectors, in general, have more coherently defined counterfactual behaviour. That is, given a very different input, a selector will still select an output that scores well on its mesa-objective, because that’s how selectors work. Whereas a controller, to the degree it optimises for an objective, seems more likely to just completely stop working on a different input. I have fairly low confidence in this argument, however: it seems to me that one can plausibly have pretty coherent counterfactual behaviour in a very broad distribution even without doing selection. And since it is ultimately the behaviour that does the damage, it would be good to have a working distinction that is based purely on that. We (the mesa-optimisation authors) haven’t been able to come up with one.
Another reason to be interested in selectors is that in RL, the learned algorithm is supposed to fill a controller role. So, restricting attention to selectors allows to talk at least somewhat meaningfully about non-optimiser agents, which is otherwise difficult, as any learned agent is in a controller-shaped context.
In any case, I hope that more work happens on this problem, either dissolving the need to talk about optimisation, or at least making all these distinctions more precise. The vagueness of everything is currently my biggest worry about the legitimacy of mesa-optimiser concerns.
Yeah, I agree with most of what you're saying here.
I think a crux here is: to what extent are mesa-controllers with simple behavioral objectives going to be simple? The argument that mesa-optimizers can compress coherent strategies does not apply here.
Actually, I think there's an argument that certain kinds of mesa-controllers can be simple: the mesa-controllers which are more like my rocket example (explicit world model; explicit representation of objective within that world model; but, optimal policy does not use any search). There is also other reason to suspect that these could survive techniques which are designed to make sure mesa-optimizers don't arise: they aren't expending a bunch of processing power on an internal search, so, you can't eliminate them with some kind of processing-power device. (Not that we know of any such device that eliminates mesa-optimizers -- but if we did, it may not help with rocket-type mesa-controllers.)
Terminology point: I like the term 'selection' for the cluster I'm pointing at, but, I keep finding myself tempted to say 'search' in an AI context. Possibly, 'search vs control' would be better terminology.
I’m not sure what “simple behavioural objective” really means. But I’d expect that for tasks requiring very simple policies, controllers would do, whereas the more complicated the policy required to solve a task, the more one would need to do some kind of search. Is this what we observe? I’m not sure. AlphaStar and OpenAI Five seem to do well enough in relatively complex domains without any explicit search built into the architecture. Are they using their recurrence to search internally? Who knows. I doubt it, but it’s not implausible.
The rocket example is interesting. I guess the question for me there is, what sorts of tasks admit an optimal policy that can be represented in this way? Here it also seems to me like the more complex an environment, the more implausible it seems that a powerful policy can be successfully represented with straightforward functions. E.g., let’s say we want a rocket not just to get to the target, but to self-identify a good target in an area and pick a trajectory that evades countermeasures. I would be somewhat surprised if we can still represent the best policy as a set of non-searchy functions. So I have this intuition that for complex state spaces, it’s hard to find pure controllers that do the job well.
Yeah, I agree that this seems possible, but extremely unclear. If something uses a fairly complex algorithm like FFT, is it search? How "sophisticated" can we get without using search? How can we define "search" and "sophisticated" so that the answer is "not very sophisticated"?
Thanks for a great post! I have a small confusion/nit regarding natural selection. Despite its name, I don't think it's a good exemplar of a selection process. Going through the features of a selection process from the start of the post:
I'd love to know why natural selection seemed obvious as an example of a selection process, since it did not to me due to its poor score on the checklist above.
In a field like alignment or embedded agency, it's useful to keep a list of one or two dozen ideas which seem like they should fit neatly into a full theory, although it's not yet clear how. When working on a theoretical framework, you regularly revisit each of those ideas, and think about how it fits in. Every once in a while, a piece will click, and another large chunk of the puzzle will come together.
Selection vs control is one of those ideas. It seems like it should fit neatly into a full theory, but it's not yet clear what that will look like. I revisit the idea pretty regularly (maybe once every 3-4 months) to see how it fits with my current thinking. It has not yet had its time, but I expect it will (that's why it's on the list, after all).
Bearing in mind that the puzzle piece has not yet properly clicked, here are some current thoughts on how it might connect to other pieces:
wayward youthformal education, I studied numerical optimization, controls systems, the science of decision-making, and related things, and so some part of me was always irked by the focus on utility functions and issues with them; take this early comment of mine and the resulting thread as an example. So I was very pleased to see a post that touches on the difference between the approaches and the resulting intuitions bringing it more into the thinking of the AIAF.
That said, I also think I've become more confused about what sorts of inferences we can draw from internal structure to external behavior, when there are Church-Turing-like reasons to think that a robot built with mental strategy X can emulate a robot built with mental strategy Y, and both psychology and practical machine learning systems look like complicated pyramids built out of simple nonlinearities that can approximate general functions (but with different simplicity priors, and thus efficiencies). This sort of distinction doesn't seem particularly useful to me from the perspective of constraining our expectations, while it does seem useful for expanding them. [That is, the range of future possibilities seems broader than one would expect if they only thought in terms of selection, or only thought in terms of control.]
This felt to me like an important distinction to think about when thinking about optimization.