Nora Belrose - AI Alignment Forum

What makes you think that's intended to be a counting argument over function space? I usually think of this as a counting argument over infinite bitstrings

I definitely thought you were making a counting argument over function space, and AFAICT Joe also thought this in his report.

The bitstring version of the argument, to the extent I can understand it, just seems even worse to me. You're making an argument about one type of learning procedure, Solomonoff induction, which is physically unrealizable and AFAICT has not even inspired any serious real-world approximations, and then assuming that somehow the conclusions will transfer over to a mechanistically very different learning procedure, gradient descent. The same goes for the circuit prior thing (although FWIW I think you're very likely wrong that minimal circuits can be deceptive).

Counting arguments provide no evidence for AI doom

[+]Nora Belrose8mo-6-13

Counting arguments provide no evidence for AI doom

Nora Belrose8mo00

Then show me how! If you think there are errors in the math, please point them out.

I'm not aware of any actual math behind the counting argument for scheming. I've only ever seen handwavy informal arguments about the number of Christs vs Martin Luthers vs Blaise Pascals. There certainly was no formal argument presented in Joe's extensive scheming report, which I assumed would be sufficient context for writing this essay.

Counting arguments provide no evidence for AI doom

Nora Belrose8mo1-3

I obviously don't think the counting argument for overfitting is actually sound, that's the whole point. But I think the counting argument for scheming is just as obviously invalid, and misuses formalisms just as egregiously, if not moreso.

I deny that your Kolmogorov framework is anything like "the proper formalism" for neural networks. I also deny that the counting argument for overfitting is appropriately characterized as a "finite bitstring" argument, because that suggests I'm talking about Turing machine programs of finite length, which I'm not- I'm directly enumerating functions over a subset of the natural numbers. Are you saying the set of functions over 1...10,000 is not a well defined mathematical object?

Counting arguments provide no evidence for AI doom

Nora Belrose8mo10

I never used any kind of bitstring analysis.

Counting arguments provide no evidence for AI doom

Nora Belrose8mo4-3

I think the infinite bitstring case has zero relevance to deep learning.

There does exist a concept you might call "simplicity" which is relevant to deep learning. The neural network Gaussian process describes the prior distribution over functions which is induced by the initialization distribution over neural net parameters. Under weak assumptions about the activation function and initialization variance, the NNGP is biased toward lower frequency functions. I think this cuts against scheming, and we plan to write up a post on this in the next month or two.

Counting arguments provide no evidence for AI doom

Nora Belrose8mo173

Thanks for the reply. A couple remarks:

"indifference over infinite bitstrings" is a misnomer in an important sense, because it's literally impossible to construct a normalized probability measure over infinite bitstrings that assigns equal probability to each one. What you're talking about is the length weighted measure that assigns exponentially more probability mass to shorter programs. That's definitely not an indifference principle, it's baking in substantive assumptions about what's more likely.
I don't see why we should expect any of this reasoning about Turing machines to transfer over to neural networks at all, which is why I didn't cast the counting argument in terms of Turing machines in the post. In the past I've seen you try to run counting or simplicity arguments in terms of parameters. I don't think any of that works, but I at least take it more seriously than the Turing machine stuff.
If we're really going to assume the Solomonoff prior here, then I may just agree with you that it's malign in Christiano's sense and could lead to scheming, but I take this to be a reductio of the idea that we can use Solomonoff as any kind of model for real world machine learning. Deep learning does not approximate Solomonoff in any meaningful sense.
Terminological point: it seems like you are using the term "simple" as if it has a unique and objective referent, namely Kolmogorov-simplicity. That's definitely not how I use the term; for me it's always relative to some subjective prior. Just wanted to make sure this doesn't cause confusion.

Counting arguments provide no evidence for AI doom

Nora Belrose8mo10

With respect to which measure though? You have to define a measure, there are going to be infinitely many possible measures you could define on this space. And then we'll have to debate if your measure is a good one.

Counting arguments provide no evidence for AI doom

Nora Belrose8mo10

It depends what you mean by a "way" the model can overfit.

Really we need to bring in measure theory to rigorously talk about this, and an early draft of this post actually did introduce some measure-theoretic concepts. Basically we need to define:

What set are we talking about,
What measure we're using over that set,
And how that measure relates to the probability measure over possible AIs.

The English locution "lots of ways to do X" can be formalized as "the measure of X-networks is high." And that's going to be an empirical claim that we can actually debate.

AI ALIGNMENT FORUM
AF

Posts

Wiki Contributions

Comments