While reading some papers, I found a way to take a myopic optimization problem L, and construct a non-myopic optimization problem L∗ whose optima match the fixed points of the original problem L. This seems potentially of interest for myopia research; it could provide a way to think about what optimization of L might give you, a tool for analyzing the trajectories of myopic optimization, and maybe also a better-behaved alternative to myopic optimization.
In order to explain the transformation, I first need to introduce a common form to myopic optimization problems. The next section goes into detail with applying the common form, but I suspect many people here are sufficiently comfortable with myopic optimization problems that they might be able to do with the shorter explanation.
In ordinary optimization, we might minimize a loss function L(m). That is, the optimized model mn would be given by the equation mn=argminmL(m).
This is non-myopic; every influence of m on the result of L gets optimized. To introduce a myopic equivalent, we need to distinguish between two influences of m; the parts that we want to optimize, and the parts that we want to be myopic about.
We can do that by introducing two separate parameters, L(x,y), with the first parameter being the parameter we optimize, and the second parameter being the parameter we are myopic about. Given a starting model mn, we can then myopically update this model, yielding mn+1=argminmL(m,mn).
This process can be iterated, but it has many poor properties, e.g. it is not guaranteed to converge, and it is unclear what its results will be. This post is about providing an alternative to that process.
Let's consider an example of myopic optimization.
Suppose you have some predictive model m, which gets trained according to some loss function L(m,x), requiring a distribution of data points x∼D. Essentially, it's trained to minimize L(m)=Ex∼D[L(m,x)].
If you deploy this model in the real world, then you would hope that it has some sort of real-world effect (if nothing else, then bringing you more ad money); otherwise, what's the point? But if it has a real-world effect, then in particular it might also have a real-world effect on the distribution D that it encounters.
This might mean that the distribution D it was trained on is no longer good for getting it to optimal performance; we should train it on D instead. But since D happened as a result of the model, it is really a function D(m) of the model. So when we retrain on the data D(m1) we got after deploying m0, that yields a new model m1 that might induce yet a new training dataset D(m1).
If we consider iterating this endlessly, what will happen? Will it be like optimizing the following expression?
If it is like optimizing this expression, then that seems somewhat concerning. Because this expression includes the dependency of the dataset on the model, optimizing this expression would, as a side-effect, optimize the dataset to become more predictable. How one would go about making a the real world more predictable is a bit unclear, but considering all of the AI safety concerns for even seemingly innocuous loss functions, this is not something we want to get involved with.
So is this what would get optimized? No; this is where the myopia comes in. When optimizing the model, we don't "look ahead" and think about how the dataset will change in response to the new model. In machine learning terms, we don't propagate the gradients through D(m).
Rather, it makes more sense to specify the optimization problem using a bivariate loss function, taking as parameters both the model m that is being optimized, and the model d that is currently deployed:
In each iteration of optimization, we then change the deployed model to be optimized for the first parameter. This can be expressed as the equation:
This optimization method has a number of bad properties. For instance, it is not guaranteed to converge. With ordinary non-myopic optimization, L always decreases as you optimize it. As long as L is bounded, this means that you cannot keep coming up with new models that decrease it, and so you must eventually reach the best model.^1
However, in myopic optimization, while L(dnew,dold) is always guaranteed to be greater than L(dold,dold), it might be that L(dnew,dnew) is less than or equal to L(dold,dold). This can happen in "rock-paper-scissors"-like situations, where the optimal solution always changes away from the deployed solution. Hence it might be nice to have an alternative approach.
So, the general form of myopic optimization can be considered to be:
And this is a terrible equation.^2 So we want to improve it.
To make it easier to reason about, let's make an assumption: It has converged to a fixed point. More precisely, we have a value m∞=argminmL(m,m∞); that is, there is no way to change the deployed model to perform better on the training data that comes from the deployed model.
An immediate implication of this is that L(m∞,m∞)=minmL(m,m∞), and so L(m∞,m∞)−minmL(m,m∞)=0. But notice that since L(x,x)≥minmL(m,x), the left-hand-side in the previous equation is always nonnegative. So this means that the equation holds only when the function L∗(x)=L(x,x)−minmL(m,x) is minimized.
This L∗ function is in some ways much better behaved. It tells us what the fixed point is, rather than just telling us a training procedure. However, in other ways it is less well-behaved:
These limitations together seem like a good reason to not just apply this transformation for no reason. That said, it seems like there is often a good reason, at least when it comes to assessing the safety of L: I would be concerned about a myopic optimization problem L whose transformed version L∗ is unsafe; while we would not be guaranteed to reach the unsafe region, it seems like the optimization of L increases the likelihood of entering the unsafe optimum of L∗ a lot.
I got the concept from this paper, where a basically-identical concept from game theory called "duality gap" is investigated for machine learning. In order to evaluate the usefulness about this transformation for myopic optimization in AI alignment and agency foundations, we need to know more about its properties. I have tried searching a bit around in the game theory literature, but I haven't immediately seen anything that is important for our purposes. Though I also haven't had time to search much, due to my day job.
Fundamentally, it seems to me that there are some potential use-cases for it:
When I consider applying the transformation to some obscure myopic optimization problems that I have thought about, then it seems to have some nice properties, but also some potentially concerning ones. I will likely make a followup post where I explain them. But the myopic optimization problems that I am considering are kind of convoluted, so they don't really fit here.
I thought it might be a good idea to leave this open to LessWrong; are there some myopic optimization problems that you have been thinking about? And if so, what effects would this de-myopization transformation have on them?
(Should we call it "duality gap transformation", considering its origins? Or something else?)
1. This is actually a lie. This is not sufficient to prove convergence, and this setup also doesn't resemble at all how optimization gets done in practice. In reality convergence is a messy issue even for non-myopic optimization. However, it's a useful simplification.
2. It is terrible partly because in the limiting form, it is a fixpoint equation, which is hard to reason about. There are variants that are even more terrible; machine learning people might usually phrase it as mn+1=mn−∇mnL(mn,stopgrad(mn)). This uses a mysterious function stopgrad with the property that stopgrad(x)=x, but ddxstopgrad(x)=0 - which of course cannot be a real function.
Thanks to Justis Mills for proofreading.
(Moderation note: added to the Alignment Forum from LessWrong.)