Predictors as Agents

[-]David Scott Krueger (formerly: capybaralet)7y30

Whether or not this happens depends on the learning algorithm. Let's assume an IID setting. Then an algorithm that evaluates many random parameter settings and choses the one that gives the best performance would have this effect. But a gradient-based learning algorithm wouldn't necessarily, since it only aims to improve its predictions locally (so what you say in the ETA is more accurate, **in this case**, I think).

Also, I just wanted to mention that Stuart Armstrong's paper "Good and safe uses of AI oracles" discusses self-fulfilling prophecies as well; Stuart provides a way of training a predictor that won't be victim to such effects (just don't reveal its predictions when training). But then it also fails to account for the effect its predictions actually have, which can be a source of irreducible error... The example is a (future) stock-price predictor: making its predictions public makes them self-refuting to some extent, as they influence market actors decisions.

[-]interstice7y10

Yeah, if you train the algorithm by random sampling, the effect I described will take place. The same thing will happen if you use an RL algorithm to update the parameters instead of an unsupervised learning algorithm(though it seems willfully perverse to do so -- you're throwing away a lot of the structure of the problem by doing this, so training will be much slower)

I also just found an old comment which makes the exact same argument I made here. (Though it now seems to me that argument is not necessarily correct!)

[-]jessicata7y20

The capacity for agency arises because, in a complex environment, there will be multiple possible fixed-points. It’s quite likely that these fixed-points will differ in how the predictor is scored, either due to inherent randomness, logical uncertainty, or computational intractability(predictors could be powerfully superhuman while still being logically uncertain and computationally limited). Then the predictor will output the fixed-point on which it scores the best.

Reflective oracles won't automatically do this. They won't minimize log loss or any other cost function. For a given situation, there can be multiple reflective oracles; for example, in a universe $M := O (M, 1 / 2)$ (i.e. the universe asks the reflective oracle if it equals 1 with probability greater or less than 50%), there are three reflective oracles: $P (M) \in {0, 1 / 2, 1}$ . There isn't any defined procedure for selecting which of these reflective oracles is the real one. A reflective oracle that says $P (M) \in {0, 1}$ will get a lower average log loss than one that says $P (M) = 1 / 2$ , however these are all considered to be reflective oracles.

Is there a reason you think a reflective oracle (or equivalent) can't just be selected "arbitrarily", and will likely be selected to maximize some score? (In this example there's an issue in that the 1/2 reflective oracle is an unstable equilibrium, so natural ways of finding reflective oracles using gradient descent will be unlikely to find it, however it is possible to set up situations where gradient descent leads to reflective oracles with suboptimal Bayes score.)

My sense is that the simplest methods for finding a reflective oracle will do something similar to finding a correlated equilibrium using gradient descent on each player's strategy individually. This certainly does a kind of optimization, though since it's similar to a multiplayer game it won't correspond to global optimization like finding the reflective oracle with the lowest expected log loss. The kind of optimization it does more resembles "given my current reflective oracle, and the expected future states resulting from this, how should I adjust this oracle to better match this distribution of future states?"

(For more on natural methods for finding (correlated) reflective oracles, I recommend looking at lectures 17-18 of this course and this post on correlated reflective oracles.)

[-]interstice7y10

Is there a reason you think a reflective oracle (or equivalent) can't just be selected "arbitrarily", and will likely be selected to maximize some score?

The gradient descent is not being done over the reflective oracles, it's being done over some general computational model like a neural net. Any highly-performing solution will necessarily look like a fixed-point-finding computation of some kind, due to the self-referential nature of the predictions. Then, since this fixed-point-finder is *internal* to the model, it will be optimized for log loss just like everything else in the model.

That is, the global optimization of the model is distinct from whatever internal optimization the fixed-point-finder uses to choose the reflective oracle. The global optimization will favor internal optimizers that produce fixed-points with good score. So while fixed-point-finders in general won't optimize for anything in particular, the one this model uses will.

[-]jessicata7y30

I think the fixed point finder won't optimize the fixed point for minimizing expected log loss. I'm going to give a concrete algorithm and show that it doesn't exhibit this behavior. If you disagree, can you present an alternative algorithm?

Here's the algorithm. Start with some oracle (not a reflective oracle). Sample ~1000000 universes based on this oracle, getting 1000000 data points for what the reflective oracle outputs. Move the oracle 1% of the way from its current position towards the oracle that would answer queries correctly given the distribution over universes implied by the data points. Repeat this procedure a lot of times (~10,000). This procedure is similar to gradient descent.

Here's an example universe:

$M := if O (M, 0.3) = 1 then f l i p (0.9) else 0$

Note the presence of two reflective oracles that are stable equilibria: one where $P (O (M, 0.3) = 1) = 0$ , and one where $P (O (M, 0.3) = 1) = 1$ . Notice that the first has lower expected log loss than the second.

Let's parameterize oracles by numbers in $[0, 1]$ representing $P (O (M, 0.3) = 1)$ (since this is the only relevant query). Start with oracle $0.5$ . If we sample 1000000 universes, about 45% of them have outcome 1. So, based on these data points, $P (M ()) = 0.45$ , so the oracle based on these data points will say $P (O (M, 0.3) = 1) = 1$ , i.e. it is parameterized by 1. So we move our current oracle (0.5) 1% of the way towards the oracle 1, yielding oracle 0.505. We repeat this a bunch of times, eventually getting an oracle parameterized by a number very close to 1.

So, this procedure yields an oracle with suboptimal expected log loss. It is not the case that the fixed point finder minimizes expected log loss. The neural net case is different, but not that much; it would give the same answer in this particular case, since the model can just be parameterized by a single real number.

[-]interstice7y20

Reflective Oracles are a bit of a weird case case because their 'loss' is more like a 0/1 loss than a log loss, so all of the minima are exactly the same(If we take a sample of 100000 universes to score them, the difference is merely incredibly small instead of 0). I was being a bit glib referencing them in the article; I had in mind something more like a model parameterizing a distribution over outputs, whose only influence on the world is via a random sample from this distribution. I think that such models should in general have fixed points for similar reasons, but am not sure. Regardless, these models will, I believe, favour fixed points whose distributions are easy to compute(But not fixed points with low entropy, that is they will punish logical uncertainty but not intrinsic uncertainy). I'm planning to run some experiments with VAEs and post the results later.

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

8

8