Musings on the Speed Prior

5A Ray

3TLW

2Evan Hubinger

3TLW

New Comment

I think there’s a lot going on with your equivocating the speed prior over circuits w/ a speed prior over programs.

I think a lot of the ideas in this direction are either confused by the difference between circuit priors and program priors, or at least treating them as equivalent. Unfortunately a lot of this is vague until you start specifying the domain of model. I think specifying this more clearly will help communicating about these ideas. To start with this myself, when I talk about circuit induction, I’m talking about things that look like large randomly initialized bayes nets (or deep neural networks).

Program Induction Priors are bad: I would claim that any program induction priors (simplicity prior, speed prior, others) are almost always a bad fit for developing useful intuitions about the behavior of large random bayes net machines.

Confusion between circuit induction speed prior and simplicity prior: I think your point about double descent is wrong — in particular, the speed is largely unchanged in double descent experiments, since the *width* is the parameter being varied, and all deep neural networks of the same depth have approximately the same speed (unless you mean something weird by speed).

Circuit Simplicity: You give circuit-size and circuit-depth as examples of a “speed prior”, which seems pretty nonstandard, especially when describing it as “not the simplicity prior”.

More than Speed and Simplicity: I think there are other metrics that provide interesting priors over circuits, like likelihood under some initialization distribution. In particular, I think “likelihood under the initialization distribution” is the prior that matters most, until we develop techniques that let us “hack the prior”.

Connection to Infinite-Size Neural Networks: I think research about neural networks approaching/at the infinite limit looks a lot like physics about black holes — and similarly can tell us interesting things about dynamics we should expect. In particular, for systems optimized by gradient descent, we end up with infinitesimal/nonexistent feature learning in the limit — which is interesting because all of the sub-modules/sub-circuits we start with are all we’ll ever have! This means that even if there are “simple” or “fast” circuits, if they’re not likely under the initialization distribution, then we expect they’ll have a vanishingly small effect on the output. (One way of thinking about this is in terms of the NTK, that even if we have extremely powerfully predictive modules, their predictive power will be overwhelmed by the much more common and simple features)

Hacking the Prior: Right now we don’t have a good understanding of the behavior of partially-hand coded neural networks, but I think they could serve as a new/distinct class of models (with regards to what functions are likely under the initialization distribution). Concretely, this could look like us “hand-programming” circuits or parts of neural networks, then randomly initializing the rest, and see if during training the model learns to use those programmed functions.

To start with, note that if you push your speed bias far enough (e.g. a strong enough circuit depth complexity or Turing machine time complexity penalty), you just get a lookup table that memorizes everything.

This is true in the TM model^{[1]}. This is not true in the circuit-depth complexity model. Remember that an arbitrary lookup table is O(log n) circuit depth. If my function I'm trying to memorize is f(x) = (x & 1), the fastest circuit is O(1), whereas a lookup table is O(log n).

(This gets even worse in models where lookup is ^{[2]} or ^{[3]})

I humbly suggest a variant family of priors: realizable-speed^{[4]} priors. That is:

- Pick a particular physics model
- This could be even things like "Conway's Game of Life".

- Don't generally worry too much about constant factors
^{[5]}. - The algorithm that gives an answer in the least time is the most probable.

A simplish example of this prior might be the following: assume a simple 3d circuit model where gates take O(1) time and space and wires take O(length) time and space, and neither gates nor wires can overlap.

This prior discourages giant lookup tables even in the limiting case, while retaining many of the advantages of speed priors.

(It does have the interesting issue that it says *nothing* about what happens outside the lightcone of the computation...)

(I realize here that I'm being very sloppy with asymptotic notation. I'm bouncing between bits / gates / elements / etc.)

^{^}It

*also*isn't true in the TM model variant where you include the depth of the FSM decode circuit instead of treating a single TM step as constant-time.^{^}Volume accessible within time t scales as

^{^}Assuming power/heat scales linearly with #gates

^{[6]}, max gates within a radius scales with and so you get instead. Ditto, eventually you run into e.g. Bekenstein bound issues, although this isn't a particularly realistic concern.^{^}This is a terrible name. I am fully open to better ones.

^{^}Which is a terrible idea given that galactic algorithms are a thing.

^{^}Assuming it scales superlinearly you scale even worse...

This is not true in the circuit-depth complexity model. Remember that an arbitrary lookup table is O(log n) circuit depth. If my function I'm trying to memorize is f(x) = (x & 1), the fastest circuit is O(1), whereas a lookup table is O(log n).

Certainly, I'm assuming that the intended function is not in O(log n), though I think that's a very mild assumption for any realistic task.

I think the prior you're suggesting is basically a circuit size prior. How do you think it differs from that?

Certainly, I'm assuming that the intended function is not in O(log n), though I think that's a very mild assumption for any realistic task.

In time, the brain (or any realistic agent) can do processing... but receives sensory data.

I think the prior you're suggesting is basically a circuit size prior. How do you think it differs from that?

Realizable-speed priors are certainly correlated with circuit size priors to some extent, but there are some important differences:

- The naive circuit size prior assumes gates take O(1) space and wiring takes zero space, and favors circuits that take less space.
- There
*are*more complex circuit size priors that e.g. assign O(1) space to a gate and O(length) space to wiring.

- There
- The variant of the realizable-speed prior has no simple analog in the circuit size prior, but
*roughly*corresponds to the circuit-depth prior. - The variant of the realizable-speed prior has no simple analog in the circuit size prior.
- The variant of the realizable-speed prior
*roughly*corresponds to the complex circuit-size prior described above, with differences described below. - Circuit size priors ignore the effects of routing congestion.
- A circuit-size prior will prefer one complex circuit of N-1 gates over two simpler circuits of N/2 gates.
- A realizable-speed prior will tend to prefer the two simpler circuits, as, essentially, they are easier to route (read: lower overall latency due to shorter wiring)
- My (untested) intuition here is that a realizable-speed prior will be better at structuring and decomposing problems than a circuit-size prior, as a result.

- Circuit size priors prefer deeper circuits than realizable-speed priors.
- A circuit-size prior will prefer a circuit of max depth 10D and N gates over a circuit of max depth D and N-1 gates.
- A realizable-speed prior will (typically) prefer the slightly larger far shallower circuit.
- Note that the human brain is surprisingly shallow, when you consider speed of neuron activation versus human speed of response. But also very wide...

Thanks to Paul Christiano, Mark Xu, Abram Demski, Kate Woolverton, and Beth Barnes for some discussions which informed this post.In the ELK report, Paul, Mark, and Ajeya express optimism about penalizing computation time as a potentially viable way to select the direct translator over the human imitator:

I am more skeptical—primarily because I am more skeptical of the speed prior's ability to do reasonable things in general. That being said, the speed prior definitely has a lot of nice things going for it, and I do think it's worth taking a careful look at both the good and the bad that the speed prior has to offer. Conceptually, what we want to pay attention to when evaluating a prior from an AI safety perspective is threefold: it needs to favor good models over bad models (e.g. the direct translator over the human imitator), it needs to be competitive to implement, and it needs to favor models with good generalization over models with bad generalization (e.g. the resulting models need to themselves be performance competitive).

Before I do that, however, an important preliminary: there are multiple different forms/types of speed priors, so when I say “the speed prior,” I really mean a class of priors including:

The basic structure of this post will be a dialog of sorts between a pro-speed-prior and an anti-speed-prior perspective. I'll start with some of the arguments in favor of the speed prior and how the anti-speed-prior perspective might respond, then give some arguments against the speed prior and how the pro-speed-prior perspective might respond.

## Why you should love the speed prior

cost-effectiveimprovement (in terms of how much loss/computation it saves per increase in description complexity) compared to all other possible mappings. In my opinion, I think this is where this argument really fails. To start with, note that if you push your speed bias far enough (e.g. a strong enough circuit depth complexity or Turing machine time complexity penalty), you just get a lookup table that memorizes everything. Thus, to get this to work, you have to use a prior with a pretty meaningful simplicity component—but then you're back at the original problem that the direct translator could be substantially more complex than the human imitator, potentially so much so that it outweighs whatever loss/time advantage the direct translator might have. One thing I have learned in spending a lot of time thinking about ELK is that there are a lot of extra terms that you can add to a simplicity prior that help the direct translator more than the human imitator—the problem is that none of them that I have found so far work on their own without the simplicity prior term there as well, which means that as long as you're working in the worst-case world where the direct translator can be arbitrarily more complex than the human imitator, they don't constitute full solutions—and I think that speed-prior-based solutions should be put in this same category.## Why you should hate the speed prior

## Conclusion?

I don't really have a single, unified conclusion to take away from all of this. Like I said at the beginning, I think I tend towards skepticism of the speed prior's ability to solve AI safety problems, at least singlehandedly, but I can't dismiss it completely and I think there are clearly strong and compelling reasons to like it. I do feel like moving in the direction of speed bias is likely to increase safety all things considered, though I also feel like there's a reasonable chance that doing so might also reduce competitiveness—in which case it's very unclear to me if that's where we want to place our alignment tax.