Against evolution as an analogy for how humans will create AGI

[-]gwern5y110

As described above, I expect AGI to be a learning algorithm—for example, it should be able to read a book and then have a better understanding of the subject matter. Every learning algorithm you’ve ever heard of—ConvNets, PPO, TD learning, etc. etc.—was directly invented, understood, and programmed by humans. None of them were discovered by an automated search over a space of algorithms. Thus we get a presumption that AGI will also be directly invented, understood, and programmed by humans.

For a post criticizing the use of evolution for end to end ML, this post seems to be pretty strawmanish and generally devoid of any grappling with the Bitter Lesson, end-to-end principle, Clune's arguments for generativity and AI-GAs program to soup up self-play for goal generation/curriculum learning, or any actual research on evolving better optimizers, DRL, or SGD itself... Where's Schmidhuber, Metz, or AutoML-Zero? Are we really going to dismiss PBT evolving populations of agents in the AlphaLeague just 'tweaking a few human-legible hyperparameters'? Why isn't Co-Reyes et al 2021 an example of evolutionary search inventing TD-learning which you claim is absurd and the sort of thing that has never happened?

[-]Steven Byrnes5y190

Thanks for all those great references!

My current thinking is: (1) Outer-loop meta-learning is slow, (2) Therefore we shouldn't expect to get all that many bits of information out of it, (3) Therefore it's a great way to search for parameter settings in a parameterized family of algorithms, but not a great way to do "the bulk of the real design work", in the sense that programmers can look at the final artifact and say "Man, I have no idea what this algorithm is doing and why it's learning anything at all, let alone why it's learning things very effectively".

Like if I look at a trained ConvNet, it's telling me: Hey Steve, take your input pixels, multiply them by this specific giant matrix of numbers, then add this vector, blah blah , and OK now you have a vector, and if the first entry of the vector is much bigger than the other entries, then you've got a picture of a tench. I say "Yeah, that is a picture of a tench, but WTF just happened?" (Unless I'm Chris Olah.) That's what I think of when I think of the outer loop doing "the bulk of the real design work".

By contrast, when I look at Co-Reyes, I see a search for parameter settings (well, a tree of operations) within a parametrized family of primarily-human-designed algorithms—just what I expected. If I wanted to run the authors' best and final RL algorithm, I would start by writing probably many thousands of lines of human-written code, all of which come from human knowledge of how RL algorithms should generally work ("...the policy is obtained from the Q-value function using an ε-greedy strategy. The agent saves this stream of transitions...to a replay buffer and continually updates the policy by minimizing a loss function...over these transitions with gradient descent..."). Then, to that big pile of code, I would add one important missing ingredient—the loss function L—containing at most 104 bits of information (if I calculated right). This ingredient is indeed designed by an automated search, but it doesn't have a lot of inscrutable complexity—the authors have no trouble writing down L and explaining intuitively why it's a sensible choice. Anyway, this is a very different kind of thing than the tench-discovery algorithm above.

Did the Co-Reyes search "invent" TD learning? Well, they searched over a narrow (-element) parameterized family of algorithms that included TD learning in it, and one of their searches settled on TD learning as a good option. Consider how few algorithms is $2^{104}$ algorithms out of the space of all possible algorithms. Isn't it shocking that TD learning was even an option? No, it's not shocking, it's deliberate. The authors already knew that TD learning was good, and when they set up their search space, they made sure that TD learning would be part of it. ("Our search language...should be expressive enough to represent existing algorithms..."). I don't find anything about that surprising!

I feel like maybe I was projecting a mood of "Outer-loop searches aren't impressive or important". I don't think that! As far as I know, we might be just a few more outer-loop searches away from AGI! (I'm doubtful, but that's a different story. Anyway it's certainly possible.) And I did in fact write that I expect this kind of thing to be probably part of the path to AGI. It's all great stuff, and I didn't write this blog post because I wanted to belittle it. I wrote the blog post to respond to the idea I've heard that, for example, we could plausibly wind up with an AGI algorithm that's fundamentally based on reinforcement learning with tree search, but we humans are totally oblivious to the fact that the algorithm is based on reinforcement learning with tree search, because it's an opaque black box generating its own endogenous reward signals and doing RL off that, and we just have no idea about any of this. It takes an awful lot of bits to build that inscrutable a black box, and I don't think outer-loop meta-learning can feasibly provide that many bits of design complexity, so far as I know. (Again I'm not an expert and I'm open to learning.)

any grappling with the Bitter Lesson

I'm not exactly sure what you think I'm saying that's contrary to Bitter Lesson. My reading of "Bitter lesson" is that it's a bad idea to write code that describes the object-level complexity of the world, like "tires are black" or "the queen is a valuable chess piece", but rather we should write learning algorithms that learn the object-level complexity of the world from data. I don't read "Bitter Lesson" as saying that humans should stop trying to write learning algorithms. Every positive example in Bitter Lesson is a human-written learning algorithm.

Take something like "Attention Is All You Need" (2017). I think of it as a success story, exactly the kind of research that moves forward the field of AI. But it's an example of humans inventing a better learning algorithm. Do you think that "Attention Is All You Need" not part of the path to AGI, but rather a step forward in the wrong direction? Is "Attention Is All You Need" the modern version of "yet another paper with a better handcrafted chess-position-evaluation algorithm"? If that's what you think, well, you can make that argument, but I don't think that argument is "The Bitter Lesson", at least not in any straightforward reading of "Bitter Lesson", AFAICT...

It would also be a pretty unusual view, right? Most people think that the invention of transformers is what AI progress looks like, right? (Not that there's anything wrong with unusual views, I'm just probing to make sure I correctly understand the ML consensus.)

[-]Richard_Ngo5y90

I personally found this post valuable and thought-provoking. Sure, there's plenty that it doesn't cover, but it's already pretty long, so that seems perfectly reasonable.

I particularly I dislike your criticism of it as strawmanish. Perhaps that would be fair if the analogy between RL and evolution were a standard principle in ML. Instead, it's a vague idea that is often left implicit, or else formulated in idiosyncratic ways. So posts like this one have to do double duty in both outlining and explaining the mainstream viewpoint (often a major task in its own right!) and then criticising it. This is most important precisely in the cases where the defenders of an implicit paradigm don't have solid articulations of it, making it particularly difficult to understand what they're actually defending. I think this is such a case.

If you disagree, I'd be curious what you consider a non-strawmanish summary of the RL-evolution analogy. Perhaps Clune's AI-GA paper? But from what I can tell opinions of it are rather mixed, and the AI-GA terminology hasn't caught on.

[-]adamShimi5y30

Just wanted to say that this comment made me add a lot of things on my reading list, so thanks for that (but I'm clearly not well-read enough to go into the discussion).

[-]gwern5y20

Further reading: https://www.reddit.com/r/reinforcementlearning/search/?q=flair%3AMetaRL&include_over_18=on&restrict_sr=on&sort=new https://www.gwern.net/Backstop#external-links

[-]paulfchristiano5y70

Outside view #1: How biomimetics has always worked

It seems like ML is different from other domains in that it already relies on incredibly massive automated search, with massive changes in the quality of our inner algorithms despite very little change in our outer algorithms. None of the other domains have this property. So it wouldn't be too surprising if the only domain in which all the early successes have this property is also the only domain in which the later successes have this property.

Outside view #2: How learning algorithms have always been developed

I don't think this one is right. If your definition of learning algorithm is the kind of thing that is "able to read a book and then have a better understanding of the subject matter" then it seems like you would be classifying the model learned by GPT-3 as a learning algorithm, since it can read a 1000 word article and then have a better understanding of the subject matter that it can use to e.g. answer questions or write related text.

It seems like your definition of "learning algorithm" is "an algorithm that humans understand," and then it's kind of unsurprising that those are the ones designed by humans. Or maybe it's something about the context size over which the algorithm operates (in which case it's worth engaging with the obvious trend extrapolation of learned transformers operating competently over longer and longer contexts) or the quality of the learning it performs?

Overall I think I agree that progress in meta-learning over the last few years has been weak enough, and evidence that models like GPT-3 perform competent learning on the inside, that it's been a modest update towards longer timelines for this kind of fully end-to-end approach. But I think it's pretty modest, and as far as I can tell the update is more like "would take more like 10^33 operations to produce using foreseeable algorithms rather than 10^28 operations" than "it's not going to happen."

3. Computational efficiency: the inner algorithm can run efficiently only to the extent that humans (and the compiler toolchain) generally understand what it’s doing

I don't think the slowdown is necessarily very large, though I'm not sure exactly what you are claiming. In particular, you can pick a neural network architecture that maps well onto the most efficient hardware that you can build, and then learn how to use the operations that can be efficiently carried out in that architecture. You can still lose something but I don't think it's a lot.

You could ask the question formally in specific computational models, e.g. what's the best fixed homogeneous circuit layout we can find for doing both FFT and quicksort, and how large is the overhead relative to doing one or the other? (Obviously for any two algorithms that you want to simulate the overhead will be at most 2x, so after finding something clean that can do both of them you'd want to look at a third algorithm. I expect that you're going to be able to do basically any algorithm that anyone cares about with <<10x overhead.)

[-]Steven Byrnes5y*10

Thanks!

ML is different from other domains in that it already relies on incredibly massive automated search, with massive changes in the quality of our inner algorithms despite very little change in our outer algorithms.

Yeah, sure, maybe. Outside views only go so far :-)

I concede that even if an evolution-like approach was objectively the best way to build wing-flapping robots, probably those roboticists would not think to actually do that, whereas it probably would occur to ML researchers.

(For what it's worth—and I don't think you were disagreeing with this—I would like to emphasize that there have been important changes in outer algorithms too, like the invention of Transformers, BatchNorm, ResNets, and so on, over the past decade, and I expect there to be more such developments in the future. This is in parallel with the ongoing work of scaling-up-the-algorithms-we've-already-got, of course.)

you would be classifying the model learned by GPT-3 as a learning algorithm, since it can read a 1000 word article and then have a better understanding of the subject matter that it can use to e.g. answer questions or write related text.

I agree that there's a sense in which, as GPT-3 goes through its 96 layers, you could say it's sorta "learning things" in earlier layers and "applying that knowledge" in later layers. I actually had a little discussion of that in an earlier version that I cut out because the article was already very long, and I figured I was already talking in detail about my thoughts on GPT-3 in the subsequent section, with the upshot that I don't see the GPT-3 trained model as belonging to the category of "the right type of learning algorithm to constitute an AGI by itself" (i.e. without some kind of fine-tuning-as-you-go system) (see "Case 5"). I put a little caveat back in. :-D

I don't think the slowdown is necessarily very large, though I'm not sure exactly what you are claiming. In particular, you can pick a neural network architecture that maps well onto the most efficient hardware that you can build, and then learn how to use the operations that can be efficiently carried out in that architecture. ... You could ask the question formally in specific computational models

Suppose that the idea of tree search had never occurred to any human, and someone programs a learning algorithm with nothing remotely like tree search in it, and then the black box has to "invent tree search". Or the black box has to "invent TD learning", or "invent off-policy replay learning", and so on. I have a hard time imagining this working well.

Like, for tree search, you need to go through this procedure where you keep querying the model, keep track of where you're at, play through some portion of an imaginary game, then go back and update the model at the end. Can a plain LSTM be trained in such a way that it will start internally doing something equivalent to tree search? If so, how inefficient will it be? That's where I'm assuming "orders of magnitude". It seems to me that a plain LSTM isn't doing the right type of operations to run a tree search algorithm, except in the extreme case that looks something like "a plain LSTM emulating a Turing machine that's doing tree search".

Likewise with replay learning—you need to store an unstructured database with a bunch of play-throughs, and then go back and replay them and learn from them when appropriate. Can a plain LSTM do that? Sure, it's Turing-complete, it can do anything. But a plain LSTM is not the right kind of computation to be storing a big unstructured database of play-throughs and then replaying them when appropriate and learning from the replays.

I agree that this could be investigated in more detail, for example by asking how badly a plain LSTM architecture would struggle to implement something equivalent to tree search, or off-policy replay learning, or TD learning, or whatever.

Then someone might object: Well, this is an irrelevant example, we're not going to be using a plain LSTM as our learning algorithm. We haven't been using plain LSTMs for years! We will use new and improved architectures. At least that's what I would say! And that leads me to the idea that we'll get AGI via people making better learning algorithms, just like people have been making better learning algorithms for years.

The problem would be solved by doing an automated search over assembly code, but I don't think that's feasible.

[-]Richard_Ngo5y50

there’s a “solving the problem twice” issue. As mentioned above, in Case 5 we need both the outer and the inner algorithm to be able to do open-ended construction of an ever-better understanding of the world—i.e., we need to solve the core problem of AGI twice with two totally different algorithms! (The first is a human-programmed learning algorithm, perhaps SGD, while the second is an incomprehensible-to-humans learning algorithm. The first stores information in weights, while the second stores information in activations, assuming a GPT-like architecture.)

Cross-posting a (slightly updated) comment I left on a draft of this document:

I suspect that this is indexed too closely to what current neural networks look like. I see no good reason why the inner algorithm won't eventually be able to change the weights as well, as in human brains. (In fact, this might be a crux for me - I agree that the inner algorithm having no ability to edit the weights seems far-fetched).

So then you might say that we've introduced a disanalogy to evolution, because humans can't edit our genome.

But the key reason I think that RL is roughly analogous to evolution is because it shapes the high-level internal structure of a neural network in roughly the same way that evolution shapes the high-level internal structure of the human brain, not because there's a totally strict distinction between levels.

E.g. the thing RL currently does, which I don't expect the inner algorithm to be able to do, is make the first three layers of the network vision layers, and then a big region over on the other side the language submodule, and so on. And eventually I expect RL to shape the way the inner algorithm does weight updates, via meta-learning.

You seem to expect that humans will be responsible for this sort of high-level design. I can see the case for that, and maybe humans will put in some modular structure, but the trend has been pushing the other way. And even if humans encode a few big modules (analogous to, say, the distinction between the neocortex and the subcortext), I expect there to be much more complexity in how those actually work which is determined by the outer algorithm (analogous to the hundreds of regions which appear across most human brains).

[-]Steven Byrnes5y10

Thanks for cross-posting this! Sorry I didn't get around to responding originally. :-)

E.g. the thing RL currently does, which I don't expect the inner algorithm to be able to do, is make the first three layers of the network vision layers, and then a big region over on the other side the language submodule, and so on. And eventually I expect RL to shape the way the inner algorithm does weight updates, via meta-learning.

For what it's worth, I figure that the neocortex has some number (dozens to hundreds, maybe 180 like your link says, I dunno) of subregions that do a task vaguely like "predict data X from context Y", with different X & Y & hyperparameters in different subregions. So some design work is obviously required to make those connections. (Some taste of what that might look like in more detail is maybe Randall O'Reilly's vision-learning model.) I figure this is vaguely analogous to figuring out what convolution kernel sizes and strides you need in a ConvNet, and that specifying all this is maybe hundreds or low thousands but not millions of bits of information. (I don't really know right now, I'm just guessing.) Where will those bits of information come from? I figure, some combination of:

automated neural architecture search
and/or people looking at the neuroanatomy literature and trying to copy ideas
and/or when the working principles of the algorithm are better understood, maybe people can just guess what architectures are reasonable, just like somebody invented U-Nets by presumably just sitting and thinking about what's a reasonable architecture for image segmentation, followed by some trial-and-error tweaking.
and/or some kind of dynamic architecture that searches for learnable relationships and makes those connections on the fly … I imagine a computer would be able to do that to a much greater extent than a brain (where signals travel slowly, new long-range high-bandwidth connections are expensive, etc.)

If I understand your comment correctly, we might actually agree on the plausibility of the brute force "automated neural architecture search" / meta-learning case. …Except for the terminology! I'm not calling it "evolution analogy" because the final learning algorithm is mainly (in terms of information content) human-designed and by-and-large human-legible. Like, maybe humans won't have a great story for why the learning rate is 1.85 in region 72 but only 1.24 in region 13...But they'll have the main story of the mechanics of the algorithm and why it learns things. (You can correct me if I'm wrong.)

[-]Rohin Shah5y50

I feel like I didn't really understand what you were trying to get at here, probably because you seem to have a detailed internal ontology that I don't really get yet. So here's some random disagreements, with the hope that more discussion leads me to figure out what this ontology actually is.

A biological analogy I like much better: The “genome = code” analogy

This analogy also seems fine to me, as someone who likes the evolution analogy

In the remainder of the post I’ll go over three reasons suggesting that the first scenario would be much less likely than the second scenario.

The first scenario strikes me as not representative of what at least I believe about AGI development, despite the fact that I agree with analogies to evolution. If by "learning algorithm" you mean things like PPO or supervised learning, then I don't expect those to be black box. If by "learning algorithm" you mean things like "GPT-3's few-shot learning capabilities", then I do expect those to be black box.

In your second scenario, where does stuff like "GPT-3's few-shot learning capabilities" come in? Are you expecting that those don't exist, or are they learned algorithms, or are they part of the learned content? My guess is you'd say "learned content", in which case I'd say that the analogy to evolution is "natural selection <-> learning algorithm, human brain <-> learned content". (Yes, the human brain can be further split into another "learning algorithm" and "what humans do"; I do think that will be a disanalogy with evolution, but it doesn't seem that important.)

As described above, I expect AGI to be a learning algorithm—for example, it should be able to read a book and then have a better understanding of the subject matter. Every learning algorithm you’ve ever heard of—ConvNets, PPO, TD learning, etc. etc.—was directly invented, understood, and programmed by humans.

GPT-3 few-shot learning? Or does that not count as a learning algorithm? What do you think is a learning algorithm? If GPT-3 few-shot learning doesn't count, then how do you expect that our current learning algorithms will get to the sample efficiency that humans seem to have?

By the same token, when we want to do human-level cognitive tasks with a computer, I claim that we'll invent a learning algorithm that reads books and watches movies and interacts and whatever else, and gradually learns how to do human-level cognitive tasks.

Seems right, except for the "invent" part. Even for humans it doesn't seem right to say that the brain's equivalent of backprop is the algorithm that "reads books and watches movies" etc, it seems like backprop created a black-box-ish capability of "learning from language" that we can then invoke to learn faster.

Incidentally, I think GPT-3 is great evidence that human-legible learning algorithms are up to the task of directly learning and using a common-sense world-model. I’m not saying that GPT-3 is necessarily directly on the path to AGI; instead I’m saying, How can you look at GPT-3 (a simple learning algorithm with a ridiculously simple objective) and then say, “Nope! AGI is way beyond what human-legible learning algorithms can do! We need a totally different path!”?

I'm totally with you on this point and I'm now confused about why I seem to disagree with you so much.

Maybe it's just that when people say "learning algorithms" you think of "PPO, experience replay, neural net architectures", etc and I think "all those things, but also the ability to read books, learn by watching and imitating others, seek out relevant information, etc" and your category doesn't include GPT-3 finetuning ability whereas mine does?

(Though then I wonder how you can justify "I claim that we'll invent a learning algorithm that reads books and watches movies and interacts and whatever else")

I’m talking about the case where I ask my AGI a question, it chugs along from time t=0 to t=10 and then gives an answer, and where the online-learning that it did during time 0<t<5 is absolutely critical for the further processing that happens during time 5<t<10.
This is how human learning works, but definitely not how, say, GPT-3 works.

Huh? GPT-3 few-shot learning is exactly "GPT-3 looks at a few examples in order, and then spits out an answer, where the processing it did to 'understand' the few examples was crucial for the processing that then spit out an answer".

You might object that GPT-3 is a Transformer and so is actually looking at all of the examples all at the same time, so this isn't an instance of what you mean. I think that's mostly a red herring -- I'd predict you'd see very similar behavior from a GPT-3 that was trained in a recurrent way, where it really is like viewing things in sequence.

For example, I mentioned AlphaStar above—it has LSTMs, self-attention, scatter connections, pointer networks, supervised learning, TD(λ), V-trace, UPGO, interface code connecting to the Starcraft executable, and so on.

This doesn't feel central, but I'd note that OpenAI Five on the other hand was PPO + shaped reward + architecture design + hyperparameter tuning and that's about it. (I find it weird that I'm arguing for more simplicity relative to you, but that is what I feel there.)

[-]Steven Byrnes5y*40

Thanks!

A lot of your comments are trying to relate this to GPT-3, I think. Maybe things will be clearer if I just directly describe how I think about GPT-3.

The evolution analogy (as I'm defining it) says that “The AGI” is identified as the inner algorithm, not the inner and outer algorithm working together. In other words, if I ask the AGI a question, I don’t need the outer algorithm to be running in the course of answering that question. Of course the GPT-3 trained model is already capable of answering "easy" questions, but I'm thinking here about "very hard" questions that need the serious construction of lots of new knowledge and ideas that build on each other. I don't think the GPT-3 trained model can do that by itself.

Now for GPT-3, the outer algorithm edits weights, and the inner algorithm edits activations. I am very impressed about the capabilities of the GPT-3 weights, edited by SGD, to store an open-ended world model of greater and greater complexity as you train it more and more. I am not so optimistic that the GPT-3 activations can do that, without somehow transferring information from activations to weights. And not just for the stupid reason that it has a finite training window. (For example, other transformer models have recurrency.)

Why don't I think that the GPT-3 trained model is just as capable of building out an open-ended world-model of ever greater complexity using activations not weights?

For one thing, it strikes me as a bit weird to think that there will be this centaur-like world model constructed out of X% weights and (100-X)% activations. And what if GPT comes to realize that one of its previous beliefs is actually wrong? Can the activations somehow act as if they're overwriting the weights? Just seems weird. How much information content can you put in the activations anyway? I don't know off the top of my head, but much less than the amount you can put in the weights.

When I think of the AGI-hard part of "learning", I think of building a solid bedrock of knowledge and ideas, such that you can build new ideas on top of the old ideas, in an arbitrarily high tower. That's the part that I don't think GPT-3 inner algorithm (trained model) by itself can do. (The outer algorithm obviously does it.) Again, I think you would need to somehow transfer information from the activations to the weights, maybe by doing something vaguely like amplification, if you were to make a real-deal AGI from something like GPT-3.

My human brain analogy for GPT-3: One thing we humans do is build a giant interconnected predictive world-model by editing synapses over the course of our lifetimes. Another thing we do is flexibly combine the knowledge and ideas we already have, on the fly, to make sense of a new input, including using working memory and so on. Don't get me wrong, this is a really hard and impressive calculation, and it can do lots of things—I think it amounts to searching over this vast combinatorial space of compositional probabilistic generative models (see analysis-by-synthesis discussion here, or also here). But it does not involve editing synapses. It's different. You've never seen nor imagined a "banana hat" in your life, but if you saw one, you would immediately understand what it is, how to manipulate it, roughly how much it weighs, etc., simply by snapping together a bunch of your existing banana-related generative models with a bunch of your existing hat-related generative models into some composite which is self-consistent and maximally consistent with your visual inputs and experience. You can do all that and much more without editing synapses.

Anyway, my human brain analogy for GPT-3 is: I think the GPT-3 outer algorithm is more-or-less akin to editing synapses, and the GPT-3 inner algorithm is more-or-less akin to the brain's inference-time calculation (...but if humans had a more impressive working memory than we actually do).

The inference-time calculation is impressive but only goes so far. You can't learn linear algebra without editing synapses. There's just too many new concepts built on top of each other, and too many new connections to be learned.

If you were to turn GPT-3 into an AGI, the closest version consistent with my current expectations would be that someone took the GPT-3 trained model but somehow inserted some kind of online-learning mechanism to update the weights as it goes (again, maybe amplification or whatever). I'm willing to believe that something like that could happen, and it would not qualify as "evolution analogy" by my definition.

Even for humans it doesn't seem right to say that the brain's equivalent of backprop is the algorithm that "reads books and watches movies" etc, it seems like backprop created a black-box-ish capability of "learning from language" that we can then invoke to learn faster.

Learning algorithms always involve an interaction between the algorithm itself and what-has-been-learned-so-far, right? Even gradient descent takes a different step depending on the current state of the model-in-training. Again see the “Inner As AGI” criterion near the top for why this is different from the thing I’m arguing against. The "learning from language" black box here doesn't go off and run on its own; it learns new things using by editing synapses according to the synapse-editing algorithm hardwired into the genome.

[-]Rohin Shah5y30

Thanks, this was helpful in understanding in where you're coming from.

When I think of the AGI-hard part of "learning", I think of building a solid bedrock of knowledge and ideas, such that you can build new ideas on top of the old ideas, in an arbitrarily high tower.

I don't feel like humans meet this bar. Maybe mathematicians, and even then, I probably still wouldn't agree. Especially not humans without external memory (e.g. paper). But presumably such humans still count as generally intelligent.

Anyway, my human brain analogy for GPT-3 is: I think the GPT-3 outer algorithm is more-or-less akin to editing synapses, and the GPT-3 inner algorithm is more-or-less akin to the brain's inference-time calculation (...but if humans had a more impressive working memory than we actually do).

Seems reasonable.

The inference-time calculation is impressive but only goes so far. You can't learn linear algebra without editing synapses. There's just too many new concepts built on top of each other, and too many new connections to be learned.

I think this makes sense in the context of humans but not in the context of AI (if you say weights = synapses). It seems totally plausible to give AI systems an external memory that they can read to / write from, and then you learn linear algebra without editing weights but with editing memory. Alternatively, you could have a recurrent neural net with a really big hidden state, and then that hidden state could be the equivalent of what you're calling "synapses".

The "learning from language" black box here doesn't go off and run on its own; it learns new things using by editing synapses according to the synapse-editing algorithm hardwired into the genome.

This feels analogous to "the AGI doesn't go and run on its own, it operates by changing values in RAM according to the assembly language interpreter hardwired into the CPU chip". Like, it's true, but it seems like it's operating at the wrong level of abstraction.

Once you've reached the point of creating schools and courses, and using spaced repetition and practice exercises, you probably don't want to be thinking in terms of "this is all stuff that's been done by the synapse-editing algorithm hardwired into the genome", you've shifted to a qualitatively new kind of learning.

----

It seems like a central crux here is:

Is it possible to build a reasonably efficient AGI that doesn't autonomously edit its weights after training?

(By AGI here I mean something about as capable as humans on a variety of tasks.)

Caveats on my "yes" position:

I wouldn't be that surprised if in practice it turns out that continually editing the weights even at deployment time is the most efficient thing to do, but I would be surprised if the difference is many orders of magnitude.
I do expect that we will continue to update AGI systems via editing weights in training loops, even after deployment. But this will be more like an iterative train-deploy-train-deploy cycle where each deploy step lasts e.g. days or more, rather than editing weights all the time (as with humans).

[-]Steven Byrnes5y30

Thanks again, this is really helpful.

I don't feel like humans meet this bar.

Hmm, imagine you get a job doing bicycle repair. After a while, you've learned a vocabulary of probably thousands of entities and affordances and interrelationships (the chain, one link on the chain, the way the chain moves, the feel of clicking the chain into place on the gear, what it looks like if a chain is loose, what it feels like to the rider when a chain is loose, if I touch the chain then my finger will be greasy, etc. etc.). All that information is stored in a highly-structured way in your brain (I think some souped-up version of a PGM, but let's not get into that), such that it can grow to hold a massive amount of information while remaining easily searchable and usable. The problem with working memory is not capacity per se, it's that it's not stored in this structured, easily-usable-and-searchable way. So the more information you put there, the more you start getting bogged down and missing things. Ditto with pen and paper, or a recurrent state, etc.

I find it helpful to think about our brain's understanding as lots of subroutines running in parallel. (Kaj calls these things "subagents", I more typically call them "generative models", Kurzweil calls them "patterns", Minsky calls this idea "society of mind", etc.) They all mostly just sit around doing nothing. But sometimes they recognize a scenario for which they have something to say, and then they jump in and say it. So in chess, there's a subroutine that says "If the board position has such-and-characteristics, it's worthwhile to consider moving the pawn." The subroutine sits quietly for months until the board has that position, and then it jumps in and injects its idea. And of course, once you consider moving the pawn, that brings to mind a different board position, and then new subroutines will recognize them, jump in, and have their say, etc.

Or if you take an imperfect rule, like "Python code runs the same on Windows and Mac", the reason we can get by using this rule is because we have a whole ecosystem of subroutines on the lookout for exceptions to the rule. There's the main subroutine that says "Yes, Python code runs the same on Windows and Mac." But there's another subroutine that says "If you're sharing code between Windows and Mac, and there's a file path variable, then it's important to follow such-and-such best practices". And yet another subroutine is sitting around looking for the presence of system library calls in cross-platform code, etc. etc.

That's what it looks like to have knowledge that is properly structured and searchable and usable. I think that's part of what the trained transformer layers are doing in GPT-3—checking whether any subroutines need to jump in and start doing their thing (or need to stop, or need to proceed to their next step (when they're time-sequenced)), based on the context of other subroutines that are currently active.

I think that GPT-3 as used today is more-or-less restricted to the subroutines that were used by people in the course of typing text within the GPT-3 training corpus. But if you, Rohin, think about your own personal knowledge of AI alignment, RL, etc. that you've built up over the years, you've created countless thousands of new little subroutines, interconnected with each other, which only exist in your brain. When you hear someone talking about utility functions, you have a subroutine that says "Every possible policy is consistent with some utility function!", and it's waiting to jump in if the person says something that contradicts that. And of course that subroutine is supported by hundreds of other little interconnected subroutines with various caveats and counterarguments and so on.

Anyway, what's the bar for an AI to be an AGI? I dunno, but one question is: "Is it competent enough to help with AI alignment research?" My strong hunch is that the AI wouldn't be all that helpful unless it's able to add new things to its own structured knowledge base, like new subroutines that say "We already tried that idea and it doesn't work", or "This idea almost works but is missing such-and-such ingredient", or "Such-and-such combination of ingredients would have this interesting property".

Hmm, well, actually, I guess it's very possible that GPT-3 is already a somewhat-helpful tool for generating / brainstorming ideas in AI alignment research. Maybe I would use it myself if I had access! I should have said "Is it competent enough to do AI alignment research". :-D

I agree that your "crux" is a crux, although I would say "effective" instead of "efficient". I think the inability to add new things to its own structured knowledge base is a limitation on what the AI can do, not just what it can do given a certain compute budget.

This feels analogous to "the AGI doesn't go and run on its own, it operates by changing values in RAM according to the assembly language interpreter hardwired into the CPU chip". Like, it's true, but it seems like it's operating at the wrong level of abstraction.

Hmm, the point of this post is to argue that we won't make AGI via a specific development path involving the following three ingredients, blah blah blah. Then there's a second step: "If so, then what? What does that imply about the resulting AGI?" I didn't talk about that; it's a different issue. In particular I am not making the argument that "the algorithm's cognition will basically be human-legible", and I don't believe that.

[-]Rohin Shah5y30

All of that sounds reasonable to me. I still don't see why you think editing weights is required, as opposed to something like editing external memory.

(Also, maybe we just won't have AGI that learns by reading books, and instead it will be more useful to have a lot of task-specific AI systems with a huge amount of "built-in" knowledge, similarly to GPT-3. I wouldn't put this as my most likely outcome, but it seems quite plausible.)

[-]Richard_Ngo5y40

It seems totally plausible to give AI systems an external memory that they can read to / write from, and then you learn linear algebra without editing weights but with editing memory. Alternatively, you could have a recurrent neural net with a really big hidden state, and then that hidden state could be the equivalent of what you're calling "synapses".

I agree with Steve that it seems really weird to have these two parallel systems of knowledge encoding the same types of things. If an AGI learned the skill of speaking english during training, but then learned the skill of speaking french during deployment, then your hypotheses imply that the implementations of those two language skills will be totally different. And it then gets weirder if they overlap - e.g. if an AGI learns a fact during training which gets stored in its weights, and then reads a correction later on during deployment, do those original weights just stay there?

I do expect that we will continue to update AGI systems via editing weights in training loops, even after deployment. But this will be more like an iterative train-deploy-train-deploy cycle where each deploy step lasts e.g. days or more, rather than editing weights all the time (as with humans).

Based on this I guess your answer to my question above is "no": the original fact will get overridden a few days later, and also the knowledge of french will be transferred into the weights eventually. But if those updates occur via self-supervised learning, then I'd count that as "autonomously edit[ing] its weights after training". And with self-supervised learning, you don't need to wait long for feedback, so why wouldn't you use it to edit weights all the time? At the very least, that would free up space in the short-term memory/hidden state.

For my own part I'm happy to concede that AGIs will need some way of editing their weights during deployment. The big question for me is how continuous this is with the rest of the training process. E.g. do you just keep doing SGD, but with a smaller learning rate? Or will there be a different (meta-learned) weight update mechanism? My money's on the latter. If it's the former, then that would update me a bit towards Steve's view, but I think I'd still expect evolution to be a good analogy for the earlier phases of SGD.

Maybe we just won't have AGI that learns by reading books, and instead it will be more useful to have a lot of task-specific AI systems with a huge amount of "built-in" knowledge, similarly to GPT-3.

If this is the case, then that would shift me away from thinking of evolution as a good analogy for AGI, because the training process would then look more like the type of skill acquisition that happens during human lifetimes. In fact, this seems like the most likely way in which Steve is right that evolution is a bad analogy.

[-]Rohin Shah5y30

If an AGI learned the skill of speaking english during training, but then learned the skill of speaking french during deployment, then your hypotheses imply that the implementations of those two language skills will be totally different. And it then gets weirder if they overlap - e.g. if an AGI learns a fact during training which gets stored in its weights, and then reads a correction later on during deployment, do those original weights just stay there?

Idk, this just sounds plausible to me. I think the hope is that the weights encode more general reasoning abilities, and most of the "facts" or "background knowledge" gets moved into memory, but that won't happen for everything and plausibly there will be this strange separation between the two. But like, sure, that doesn't seem crazy.

I do expect we reconsolidate into weights through some outer algorithm like gradient descent (and that may not require any human input). If you want to count that as "autonomously editing its weights", then fine, though I'm not sure how this influences any downstream disagreement.

Similar dynamics in humans:

Children are apparently better at learning languages than adults; it seems like adults are using some different process to learn languages (though probably not as different as editing memory vs. editing weights)
One theory of sleep is that it is consolidating the experiences of the day into synapses, suggesting that any within-day learning is not relying as much on editing synapses.

Tbc, I also think explicitly meta-learned update rules are plausible -- don't take any of this as "I think this is definitely going to happen" but more as "I don't see a reason why this couldn't happen".

In fact, this seems like the most likely way in which Steve is right that evolution is a bad analogy.

Fwiw I've mostly been ignoring the point of whether or not evolution is a good analogy. If you want to discuss that, I want to know what specifically you use the analogy for. For example:

I think evolution is a good analogy for how inner alignment issues can arise.
I don't think evolution is a good analogy for the process by which AGI is made (if you think that the analogy is that we literally use natural selection to improve AI systems).

It seems like Steve is arguing the second, and I probably agree (depending on what exactly he means, which I'm still not super clear on).

[-]Steven Byrnes5y30

I think evolution is a good analogy for how inner alignment issues can arise.
I don't think evolution is a good analogy for the process by which AGI is made (if you think that the analogy is that we literally use natural selection to improve AI systems).

Yes this post is about the process by which AGI is made, i.e. #2. (See "I want to be specific about what I’m arguing against here."...) I'm not sure what you mean by "literal natural selection", but FWIW I'm lumping together outer-loop optimization algorithms regardless of whether they're evolutionary or gradient descent or downhill-simplex or whatever.

[-]Daniel Kokotajlo5y30

Incidentally, I think GPT-3 is great evidence that human-legible learning algorithms are up to the task of directly learning and using a common-sense world-model. I’m not saying that GPT-3 is necessarily directly on the path to AGI; instead I’m saying, How can you look at GPT-3 (a simple learning algorithm with a ridiculously simple objective) and then say, “Nope! AGI is way beyond what human-legible learning algorithms can do! We need a totally different path!”?

I think the response would be, "GPT-3 may have learned an awesome general common-sense world-model, but it took 300,000,000 tokens of training to do so. AI won't be transformative until it can learn quickly/data-efficiently. (Or until we have enough compute to train it slowly/inefficiently on medium or long-horizon tasks, which is far in the future.)"

What would you say to that?

[-]Steven Byrnes5y*30

Good question!

A kinda generic answer is: (1) Transformers were an advance over previous learning algorithms, and by the same token I expect that yet-to-be-invented learning algorithms will be an advance over Transformers; (2) Sample-efficient learning is AFAICT a hot area that lots of people are working on; (3) We do in fact actually have impressively sample-efficient algorithms even if they're not as well-developed and scalable as others at the moment—see my discussion of analysis-by-synthesis; (4) Given that predictive learning offers tons of data, it's not obvious how important sample-efficiency is.

More detailed answer: I agree that in the "intelligence via online learning" paradigm I mentioned, you really want to see something once and immediately commit it to memory. Hard to carry on a conversation otherwise! The human brain has two main tricks for this (that I know of).

There's a giant structured memory (predictive world-model) in the neocortex, and a much smaller unstructured memory in the hippocampus, and the latter is basically just an auto-associative memory (with a pattern separator to avoid cross-talk) that memorizes things. Then it can replay it when appropriate. And just like replay learning in ML, or like doing multiple passes through your training data in ML, relevant information can gradually transfer from the unstructured memory to the structured one by repeated replays.
Because the structured memory is in the analysis-by-synthesis paradigm (i.e. searching for a generative model that matches the data), it inherently needs less training data, because its inductive biases are a closer match to reality. It's a harder search problem to build the right generative model when you're learning, and it's a harder search problem to find the right generative model at inference time, but once you get it, it generalizes better and takes you farther. For example, you can "train on no data whatsoever"—just stare into space for a while, thinking about the problem, and wind up learning something new. This is only possible because you have a space of generative models, so you can run internal experiments. How's that for sample efficiency?!

AlphaStar and GPT-3 don't do analysis-by-synthesis—well, they weren't designed to do it, although my hunch is that GPT-3 is successful by doing it to a limited extent (this may be related to the Hopfield network thing). But we do have algorithms at an earlier stage of development / refinement / scaling-up that are based on those principles, and they are indeed very highly sample-efficient, and in the coming years I expect that they'll be used more widely in AI.

[-]Daniel Kokotajlo5y20

To make sure I understand: you are saying (a) that our AIs are fairly likely to get significantly more sample-efficient in the near future, and (b) even if they don't, there's plenty of data around.

I think (b) isn't a good response if you think that transformative AI will probably need to be human brain sized and you believe the scaling laws and you think that short-horizon training won't be enough. (Because then we'll need something like 10^30+ FLOP to train TAI, which is plausibly reachable in 20 years but probably not in 10. That said, I think short-horizon training might be enough.

I think (a) is a good response, but it faces the objection: Why now? Why should we expect sample-efficiency to get dramatically better in the near future, when it has gotten only very slowly better in the past? (Has it? I'm guessing so, maybe I'm wrong?)

[-]Daniel Kokotajlo5y30

Note that evolution is not in this picture: its role has been usurped by the engineers who wrote the PyTorch code. This is intelligent design, not evolution!

IMO you should put evolution in the picture, as another part of the analogy! :) Make a new row at the top, with "Genomes evolving over millions of generations on a planet, as organisms with better combinations of genes outcompete others" on the left and "Code libraries evolving over thousands of days in an industry, as software/ANN's with better code outcompete (in the economy, in the academic prestige competition, in the minds of individual researchers) others" on the right. (Or some shortened version of that)

[-]Daniel Kokotajlo5y20

Maybe we won’t restart the inner algorithm from scratch every time we edit it, since it’s so expensive to do so. Instead, maybe once in a while we’ll restart the algorithm from scratch (“re-initialize to random weights” or something analogous), but most of the time, we’ll take whatever data structure holds the AI’s world-knowledge, and preserve it between one version of the inner algorithm and its successor. Doing that is perfectly fine and plausible, but again, the result doesn’t look like evolution; it looks like a hyperparameter search within a highly-constrained class of human-designed algorithms. Why? Because the world-knowledge data structure—a huge part of how an AGI works!—needs to be designed by humans and inserted into the AGI architecture in a modular way, for this approach to be possible at all.

Does it though? In "crystal nights" I described an AI-by-evolution scenario in which the ability to copy chunks of learned brain into your offspring is in the toolkit/genome for evolution to use if it wants. It sounds like you are saying this wouldn't work, but I don't see why.

EDIT: Also, the "Amp(GPT-7)" story seems to me to be an example of your Case 4 or Case 5 maybe, while also being an example of the evolutionary analogy being correct (see: the final step, where we evolve the chinese room bureaucracies).

[-]Steven Byrnes5y10

Hmm, if you don't know which bits are the learning algorithm and which are the learned content, and they're freely intermingling, then I guess you could try randomizing different subsets of the bits in your algorithm, and see what happens, or something, and try to figure it out. This seems like a computationally-intensive and error-prone process, to me, although I suppose it's hard to know. Also, which is which could be dynamic, and there could be bits that are not cleanly in either category. If you get it wrong, then you're going to wind up updating the knowledge instead of the learning algorithm, or get bits of the learning algorithm that are stuck in a bad state but you're not editing them because you think they're knowledge. I dunno. I guess that's not a disproof, but I'm going to stick with "unlikely".

With enough compute, can't rule anything out—you could do a blind search over assembly code! I tend to think that more compute-efficient paths to AGI are far likelier to happen than less compute-efficient paths to AGI, other things equal, because the less compute that's needed, the faster you can run experiments, and the more people are able to develop and experiment with the algorithms. Maybe one giant government project can do a blind search over assembly code, but thousands of grad students and employees can run little experiments in less computationally expensive domains.

Human intelligence	Artificial intelligence
Human genome	GitHub repository with all the PyTorch code for training and running the PacMan-playing agent
Within-lifetime learning	Training the PacMan-playing agent
How an adult human thinks and acts	Trained PacMan-playing agent

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

27

Against evolution as an analogy for how humans will create AGI

27

Outside view #1: How biomimetics has always worked

Outside view #2: How learning algorithms have always been developed

3. Computational efficiency: the inner algorithm can run efficiently only to the extent that humans (and the compiler toolchain) generally understand what it’s doing

Background

Defining “The Evolution Analogy for AGI Development”: Three ingredients

A biological analogy I like much better: The “genome = code” analogy

A motivating question: Two visions for how brain-like AGI would come to be

1. A couple outside-view arguments

Outside view #1: How biomimetics has always worked

Outside view #2: How learning algorithms have always been developed

In general, which algorithms are a good fit for automated design (= design by learning algorithm), and which algorithms are a good fit for human design?

Possible objections to the learning-algorithm-outside-view argument

2. Split into cases based on how the algorithm comes to understand the world

Case 1: “Intelligence Via Online Learning”—The inner algorithm cannot build an ever-expanding web of knowledge & understanding by itself, but it can do so in conjunction with the outer algorithm

Cases 2-5: After training, the inner algorithm by itself (i.e. without the outer algorithm's involvement) can build an ever-expanding web of knowledge & understanding

Cases 2-3: The inner algorithm, by itself, builds an ever-expanding web of knowledge & understanding from scratch

Case 2: Outer algorithm starts the inner algorithm from scratch, lets it run all the way to AGI-level performance, then edits the algorithm and restarts it from scratch

Case 3: While the inner algorithm can build up knowledge from scratch, during development we try to preserve the “knowledge” data structure where possible, carrying it over from one version of the inner algorithm to the next

Cases 4-5: The inner algorithm cannot start from scratch—it needs to start with a base of preexisting knowledge & understanding. But then can expand that knowledge arbitrarily far by itself.

Case 4: The inner algorithm’s starting knowledge base is directly built by humans.

Case 5: The inner algorithm’s starting knowledge base is built by the outer algorithm.

3. Computational efficiency: the inner algorithm can run efficiently only to the extent that humans (and the compiler toolchain) generally understand what it’s doing

But first: A digression into algorithms and their low-level implementations

Back to the main argument