The Parable of Predict-O-Matic

[-]fiddler5y100Review for 2019 Review

I think this post is incredibly useful as a concrete example of the challenges of seemingly benign powerful AI, and makes a compelling case for serious AI safety research being a prerequisite to any safe further AI development. I strongly dislike part 9, as painting the Predict-o-matic as consciously influencing others personality at the expense of short-term prediction error seems contradictory to the point of the rest of the story. I suspect I would dislike part 9 significantly less if it was framed in terms of a strategy to maximize predictive accuracy.

More specifically, I really enjoy the focus on the complexity of “optimization” on a gears-level: I think that it’s a useful departure from high abstraction levels, as the question of what predictive accuracy means, and the strategy AI would use to pursue it, is highly influenced by the approach taken. I think a more rigorous approach to analyzing whether different AI approaches are susceptible to “undercutting” as a safety feature would be an extremely valuable piece. My suspicion is that even the engineer’s perspective here is significantly under-specified with the details necessary to determine whether this vulnerability exists.

I also think that Part 9 detracts from the piece in two main ways: by painting the predict-o-matic as conscious, it implies a significantly more advanced AI than necessary to exhibit this effect. Additionally, because the AI admits to sacrificing predictIve accuracy in favor of some abstract value-add, it seems like pretty much any naive strategy would outcompete the current one, according to the engineer, meaning that the type of threat is also distorted: the main worry should be AI OPTIMIZING for predictive accuracy, not pursuing its own goals. That’s bad sci-fi or very advanced GAI, not a prediction-optimizer.

I would support the deletion or aggressive editing of part 9 in this and future similar pieces: I’m not sure what it adds. ETA-I think whether or not this post should be updated depends on whether you think the harms of part 9 outweigh the benefit of the previous parts: it’s plausible to me that the benefits of a clearly framed story that’s relevant to AI safety are enormous, but it’s also plausible that the costs of creating a false sense of security are larger.

[-]johnswentworth5y50

I would support the deletion or aggressive editing of part 9 in this and future similar pieces

I don't think I'm the target audience for this story so I'm not leaving a full review, but +1 to this. Part 9 seems to be trying to display another possible failure mode (specifically inner misalignment), but it severely undercuts the core message from the rest of the post: that a predictive accuracy optimizer is dangerous even if that's all it optimizes for.

I do think an analogous story which focused specifically on inner optimization would be great, but mixing it in here dilutes the main message.

[-]abramdemski5y40

I don't see why it should necessarily undercut the core message of the post, since inner optimizers are still in some sense about the consequences of a pure predictive accuracy optimizer (but in the selection sense, not the control sense). But I agree that it wasn't sufficiently well done. It didn't feel like a natural next complication, the way everything else did.

[-]johnswentworth5y20

I wouldn't say that inner optimizers are about the consequences of pure predictive accuracy optimization; the two are orthogonal. An inner optimizer can pop up in optimizers which optimize for things besides predictive accuracy, and predictive accuracy optimization can be done in ways which don't give rise to inner optimizers. Contrast that to the other failure modes discussed in the post, which are inherently about predictive accuracy - e.g. the assassination markets problem.

[-]abramdemski5y20

OK, yeah, that's fair.

[-]abramdemski5y30

I share a feeling that part 9 is somehow bad, and I think your points are fair.

[-]Vanessa Kosoy6y90

This was extremely entertaining and also had good points. For now, just one question:

...The intern was arguing that minimizing prediction error would have all kinds of unintended bad effects. Which was crazy enough. The engineer was worse: they were arguing that Predict-O-Matic might maximize prediction error! Some kind of duality principle. Minimizing in one direction means maximizing in the other direction. Whatever that means.

Is this a reference to duality in optimization? If so, I don't understand the formal connection?

[-]abramdemski6y110

No, it was a speculative conjecture which I thought of while writing.

The idea is that incentivizing agents to lower the error of your predictions (as in a prediction market) looks exactly like incentivizing them to "create" information (find ways of making the world more chaotic), and this is no coincidence. So perhaps there's a more general principle behind it, where trying to incentivize minimization of f(x,y) only through channel x (eg, only by improving predictions) results in an incentive to maximize f through y, under some additional assumptions. Maybe there is a connection to optimization duality in there.

In terms of the fictional cannon, I think of it as the engineer trying to convince the boss by simplifying things and making wild but impressive sounding conjectures. :)

[-]Isnasene6y60

Don't mind me; just trying to summarize some of the stuff I just processed.

If you're choosing a strategy of predicting the future based on how accurate it turns out to be, the strategy who's output influences the future in ways that make its prediction more likely will outperform a strategy that doesn't (all else being equal). Thus, one might think that the strategy you chose will be the strategy that most effectively balances its prediction between a) how accurate that prediction (unconditioned on the prediction being given) and b) how much the prediction itself improves the accuracy of the prediction (conditioning on the prediction). Because of this, the intern predicts that the world will be made more predictable than it would be normally.

In short, you'll tend to choose the prediction strategies that give self-fulfilling predictions when possible over those that don't.

However, choosing the strategy that predicts the future most accurately is also equivalent to throwing away every strategy that doesn't predict the future the best. In the same way that self-fulfilling predictions are good for prediction strategies because they enhance accuracy of the strategy in question, self-fulfilling predictions that seem generally surprising to outside observers are even better because they lower the accuracy of competing strategies. The established prediction strategy thus systematically causes the kinds of events in the world that no other method could predict to further establish itself. Because of this, the engineer predicts that the world will become less predictable than it would be normally.

In short, you'll tend to choose the prediction strategy that give self-fulfilling predictions which fulfill in maximally surprising ways relative to the other prediction strategies you are considering.

Oh god...

[-]abramdemski6y100

I'm actually trying to be somewhat agnostic about the right conclusion here. I could have easily added another chapter discussing why the maximizing-surprise idea is not quite right. The moral is that the questions are quite complicated, and thinking vaguely about 'optimization processes' is quite far from adequate to understand this. Furthermore, it'll depend quite a bit on the actual details of a training procedure!

[-]Vanessa Kosoy5y50Nomination for 2019 Review

This is a rather clever parable which explains serious AI alignment problems in an entertaining form that doesn't detract from the substance.

[-]Johannes Treutlein3y30

If someone had a strategy that took two years, they would have to over-bid in the first year, taking a loss. But then they have to under-bid on the second year if they're going to make a profit, and--"

"And they get undercut, because someone figures them out."

I think one could imagine scenarios where the first trader can use their influence in the first year to make sure they are not undercut in the second year, analogous to the prediction market example. For instance, the trader could install some kind of encryption in the software that this company uses, which can only be decrypted by the private key of the first trader. Then in the second year, all the other traders would face additional costs of replacing the software that is useless to them, while the first trader can continue using it, so the first trader can make more money in the second year (and get their loss from the first year back).

[-]Daniel Kokotajlo5y30Nomination for 2019 Review

This piece of fiction is good sci-fi. It is fun to read and makes you think. In this case, it makes you think about some really important issues in AI safety and AI alignment.

[-]mako yass6y30

I've noticed that the word "stipulation" is a pretty good word for the category of claims that become true when we decide they are true. It's probably best if we try to broaden its connotations to encompass self-fulfilling prophesies than it is to make some other word or name this category "prophesy" or something.

It's clear that the category does deserve a name.

[-]abramdemski6y20

I guess 'self-fulfilling prophecy' is a bit long and awkward. Sometimes 'basilisk' is thrown around, but, specifically for negative cases (self-fulfilling-and-bad). But, are you trying to name something slightly different (perhaps broader or narrower) than self-fulfilling prophecy points at?

I find I don't like 'stipulation'; that has the connotation of command, for me (like, if I tell you to do something).

[-]orthonormal5y10Review for 2019 Review

This reminds me of That Alien Message, but as a parable about mesa-alignment rather than outer alignment. It reads well, and helps make the concepts more salient. Recommended.

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

91

91

1

2

3

4

5

6

7

8

9

91

The Parable of Predict-O-Matic

91

1

2

3

4

5

6

7

8

9

Related