Fascinating - but why is this an objection? Is it just the inelegance of not being able to look at a single time slice and answer the question of whether optimization is happening?
No, the issue is that the usual definition of an optimization problem (e.g. maxx f(x)) has no built-in notion of time, and the intuitive notion of optimization (e.g. "the system makes Y big") has no built-in notion of time (or at least linear time). It's this really fundamental thing that isn't present in the "original problem", so to speak; it would be very surprising and interesting if time had to be involved when it's not present from the start.
If I specifically try to brainstorm things-which-look-like-optimization-but-don't-involve-objective-improvement-over-time, then it's not hard to come up with examples:
Another big thing to note in examples like e.g. iteratively computing a square root for the quadratic formula or iteratively computing eigenvalues to solve a matrix: the optimization problems we're solving are subproblems, not the original full problem. These crucially differ from most of the examples in the OP in that the system's objective function (in your sense) does not match the objective function (in the usual intuitive sense). They're iteratively optimizing a subproblem's objective, not the "full" problem's objective.
That's potentially an issue for thinking about e.g. AI as an optimizer: if it's using iterative optimization on subproblems, but using those results to perform some higher-level optimization in a non-iterative manner, then aligning the sobproblem-optimizers may not be synonymous with aligning the full AI. Indeed, I think a lot of reasoning works very much like this: we decompose a high-dimensional problem into coupled low-dimensional subproblems (i.e. "gears"), then apply iterative optimizers to the subproblems. That's exactly how eigenvalue algorithms work, for instance: we decompose the full problem into a series of optimization subproblems in narrower and narrower subspaces, while the "high-level" part of the algorithm (i.e. outside the subproblems) doesn't look like iterative optimization.
I think that there are some things that are sensitively dependant on other parts of the system, and we usually just call those bits random.
One key piece missing here: the parts we call "random" are not just sensitively dependent on other parts of the system, they're sensitively dependent on many other parts of the system. E.g. predicting the long-run trajectories of billiard balls bouncing off each other requires very precise knowledge of the initial conditions of every billiard ball in the system. If we have no knowledge of even just one ball, then we have to treat all the long-run trajectories at random.
That's why sensitive dependence on many variables matters: lack of knowledge of just one of them wipes out all of our signal. If there's a large number of such variables, then we'll always be missing knowledge of at least one, so we call the whole system random.
One related thing I was thinking about last week: part of the idea of abstraction is that we can pick a Markov blanket around some variable X, and anything outside that Markov blanket can only "see" abstract summary information f(X). So, if we have a goal which only cares about things outside that Markov blanket, then that goal will only care about f(X) rather than all of X. This holds for any goal which only cares about things outside the blanket. That sounds like instrumental convergence: any goal which does not explicitly care about things near X itself, will care only about controlling f(X), not all of X.
This isn't quite the same notion of goal-locality that the OP is using (it's not about how close the goal-variables are to the agent), but it feels like there's some overlapping ideas there.
My biggest objection to this definition is that it inherently requires time. At a bare minimum, there needs to be an "initial state" and a "final state" within the same state space, so we can talk about the system going from outside the target set to inside the target set.
One class of cases which definitely seem like optimization but do not satisfy this property at all: one-shot non-iterative optimization. For instance, I could write a convex function optimizer which works by symbolically differentiating the objective function and then algebraically solving for a point at which the gradient is zero.
Is there an argument that I should not consider this to be an optimizer?
The set of optimizing systems is smaller than the set of all AI services, but larger than the set of goal-directed agentic systems....A tree is an optimizing system but not a goal-directed agent system.
The set of optimizing systems is smaller than the set of all AI services, but larger than the set of goal-directed agentic systems.
A tree is an optimizing system but not a goal-directed agent system.
I'm not sure this is true, at least not in the sense that we usually think about "goal-directed agent systems".
You make a case that there's no distinct subsystem of the tree which is "doing the optimizing", but this isn't obviously relevant to whether the tree is agenty. For instance, the tree presumably still needs to model its environment to some extent, and "make decisions" to optimize its growth within the environment - e.g. new branches/leaves growing toward sunlight and roots growing toward water, or the tree "predicting" when the seasons are turning and growing/dropping leaves accordingly.
One to think about whether "the set of optimizing systems is smaller than the set of all AI services, but larger than the set of goal-directed agentic systems" is that it's equivalent to Scott's (open) question does agent-like behavior imply agent-like architecture?
At first I particularly liked the idea of identifying systems with "an optimizer" as those which are robust to changes in the object of optimization, but brittle with respect to changes in the engine of optimization.
On reflection, it seems like a useful heuristic but not a reliable definition. A counterexample: suppose we do manage to build a robust AI which maximizes some utility function. One desirable property of such an AI is that it's robust to e.g. one of its servers going down or corrupted data on a hard drive; the AI itself should be robust to as many interventions as possible. Ideally it would even be robust to minor bugs in its own source code. Yet it still seems like the AI is the "engine", and it optimizes the rest of the world.
This is excellent! Very well done, I would love to see more work like this.
I have a whole bunch of things to say along separate directions so I'll break them into separate comments. This first one is just a couple minor notes:
... we don't merely want a precise theory that lets us build an agent; we want our theory to act like a box that takes in an arbitrary agent (such as one built using ML and other black boxes) and allows us to analyze its behavior.
FWIW, this is what I consider myself to be mainly working towards, and I do expect that the problem is directly solvable. I don't think that's a necessary case to make in order for HRAD-style research to be far and away the highest priority for AI safety (so it's not necessarily a crux), but I do think it's both sufficient and true.