AI ALIGNMENT FORUM
AF

(I suppose it is possible that evolution got one really lucky mutation that mutated 50 things at once and it needed all 50 of those mutations to create culture, and this was extraordinarily unlikely but anthropic effects can explain that, but I don't think you're arguing that.)

The question isn't whether there are simple changes -- it seems likely there were -- the question is whether we should expect humans not to find these simple changes.

Will humans continually pursue all simple yet powerful changes to our AIs?

I feel like the answer to this question doesn't tell me that much about takeoff speeds. Taking your quote from Paul:

If we step back from skills and instead look at outcomes we could say: “Evolution is always optimizing for fitness, and humans have now taken over the world.”

I see the key distinguishing feature of the argument as "evolution myopically optimized for fitness; culture is not myopically good for fitness; humans instead optimize for usefulness; culture is good for usefulness". The key feature isn't about whether humans are better optimizers than evolution, it's about what the target of the optimization is. Even if humans were as "dumb" at optimization as evolution, but they continued to evaluate AI systems by how useful they are, I'd still expect a continuous takeoff.

(Also, you seem to be arguing that the dumber we expect human optimization to be, the more we should expect discontinuities. This seems kind of wild -- surely with more understanding you can find more fundamental insights, and this leads to discontinuities? Sure seems like this was the story with nukes, for example.)

[-]Richard_Ngo5y40

Hmm, let's see. So the question I'm trying to ask here is: do other species lack proto-culture mainly because of an evolutionary oversight, or because proto-culture is not very useful until you're close to human-level in other respects? In other words, is the discontinuity we've observed mainly because evolution took a weird path through the landscape of possible minds, or because the landscape is inherently quite discontinuous with respect to usefulness? I interpret Paul as claiming the former.

But if the former is true, then we should expect that there are many species (including chimpanzees) in which selection for proto-culture would be useful even in the absence of other changes like increased brain size or social skills, because proto-culture is a useful thing for them to have in ways that evolution has been ignoring. So by "simple changes" I mean something like: changes which could be induced by a relatively short period of medium-strength selection specifically for proto-culture (say, 100,000 years; much less than the human-chimp gap).

Another very similar question which is maybe more intuitive: suppose we take animals like monkeys, and evolve them by selecting the ones which seem like they're making the most progress towards building a technological civilisation, until eventually they succeed. Would their progress be much more continuous than the human case, or fairly similar? Paul would say the former, I'm currently leaning (slightly) towards the latter. This version of the question doesn't make so much sense with chimpanzees, since it may be the case that by the time we reach chimpanzees, we've "locked in" a pretty sharp discontinuity.

Both of these are proxies for the thing I'm actually interested in, which is whether more direct optimisation for reaching civilisation leads to much more continuous paths to civilisation than the one we took.

The question isn't whether there are simple changes -- it seems likely there were -- the question is whether we should expect humans not to find these simple changes.

Both of these are interesting questions, if you interpret the former in the way I just described.

Separately, even if we concede that evolutionary progress could have been much more continuous if it had been "optimising for the right thing", we can also question whether humans will "optimise for the right thing".

You seem to be arguing that the dumber we expect human optimization to be, the more we should expect discontinuities. This seems kind of wild

Paul's argument is that evolution was discontinuous specifically because evolution was dumb in certain ways. My claim is that AGI may be discontinuous specifically because humans are dumb in certain ways (i.e. taking a long time to notice big breakthroughs, during which an overhang builds up). There are other ways in which humans being dumb would make discontinuities less likely (e.g. if we're incapable of big breakthroughs). That's why I phrased the question as "Will humans continually pursue all simple yet powerful changes to our AIs?", because I agree that humans are smart enough to find simple yet powerful changes if we're looking in the right direction, but I think there will be long periods in which we're looking in the wrong direction (i.e. not "continually pursuing" the most productive directions).

Thanks for the feedback. My responses are all things that I probably should have put in the original post. If they make sense to you (even if you disagree with them) then I'll edit the post to add them in.

Oh, one last thing I should mention: a reason that this topic seems quite difficult for me to pin down is that the two questions seem pretty closely tied together. So if you think that the landscape of usefulness is really weird and discontinuous, then maybe humans can still find a continuous path by being really clever. Or maybe the landscape is actually pretty smooth, but humans are so much dumber than evolution that by default we'll end up on a much more discontinuous path (because we accumulate massive hardware overhangs while waiting for the key insights). I don't know how to pin down definitions for each of the questions which don't implicitly depend on our expectations about the other question.

[-]Rohin Shah5y30

do other species lack proto-culture mainly because of an evolutionary oversight, or because proto-culture is not very useful until you're close to human-level in other respects? In other words, is the discontinuity we've observed mainly because evolution took a weird path through the landscape of possible minds, or because the landscape is inherently quite discontinuous with respect to usefulness? I interpret Paul as claiming the former.

I think I disagree with the framing. Suppose I'm trying to be a great physicist, and I study a bunch of physics, which requires some relatively high-level understanding of math. At some point I want to do new research into general relativity, and so I do a deep dive into abstract algebra / category theory to understand tensors better. Thanks to my practice with physics, I'm able to pick it up much faster than a typical person who starts studying abstract algebra.

If you evaluate by "ability to do abstract algebra", it seems like there was a sharp discontinuity, even though on "ability to do physics" there was not. But if I had started off trying to learn abstract algebra before doing any physics, then there would not have been such a discontinuity.

It seems wrong to say that my discontinuity in abstract algebra was "mainly because of an oversight in how I learned things", or to say that "my learning took a weird path through the landscape of possible ways to learn fields". Like, maybe those things would be true if you assume I had the goal of learning abstract algebra. But it's far more natural and coherent to just say "Rohin wasn't trying to learn abstract algebra, he was trying to learn physics".

Similarly, I think you shouldn't be saying that there were "evolutionary oversights" or "weird paths", you should be saying "evolution wasn't optimizing for proto-culture, it was optimizing for reproductive fitness".

But if the former is true, then we should expect that there are many species (including chimpanzees) in which selection for proto-culture would be useful even in the absence of other changes like increased brain size or social skills, because proto-culture is a useful thing for them to have in ways that evolution has been ignoring.

What does "useful" mean here? If by "useful" you mean "improves an individual's reproductive fitness", then I disagree with the claim and I think that's where the major disagreement is. (I also disagree that this is an implication of the argument that evolution wasn't optimizing for proto-culture.)

If by "useful" you mean "helps in building a technological civilization", then yes, I agree with the claim, but I don't see why it has any relevance.

Another very similar question which is maybe more intuitive: suppose we take animals like monkeys, and evolve them by selecting the ones which seem like they're making the most progress towards building a technological civilisation, until eventually they succeed. Would their progress be much more continuous than the human case, or fairly similar? Paul would say the former

Yes, I agree with this one (at least if we get to use a shaped reward, e.g. we get to select the ones that show signs of intelligence / culture, on the view that that is a necessary prereq to technological civilization).

I don't know why you lean towards the latter.

Paul's argument is that evolution was discontinuous specifically because evolution was dumb in certain ways.

No, the argument is that evolution wasn't trying, not that it was dumb. (Really I want to taboo "trying", "dumb", "intelligent", "optimization" and so on and talk only about what is and isn't selected for -- the argument is that evolution was not selecting for proto-culture / intelligence, whereas humans will select for proto-culture / intelligence. But if you must cast this in agent-oriented language, it's that evolution wasn't trying, not that it was dumb.)

My claim is that AGI may be discontinuous specifically because humans are dumb in certain ways

I'd find this a lot more believable if you convincingly argued for "humans are not selecting for agents with proto-culture / intelligence", which is the analog to "evolution was not selecting for proto-culture / intelligence".

a reason that this topic seems quite difficult for me to pin down is that the two questions seem pretty closely tied together. So if you think that the landscape of usefulness is really weird and discontinuous, then maybe humans can still find a continuous path by being really clever.

I agree that arguments about the landscape are going to be difficult. I don't think that's the crux of Paul's original argument.

[-]Richard_Ngo5y10

I think that, because culture is eventually very useful for fitness, you can either think of the problem as evolution not optimising for culture, or evolution optimising for fitness badly. And these are roughly equivalent ways of thinking about it, just different framings. Paul notes this duality in his original post:

If we step back from skills and instead look at outcomes we could say: “Evolution is always optimizing for fitness, and humans have now taken over the world.” On this perspective, I’m making a claim about the limits of evolution. First, evolution is theoretically optimizing for fitness, but it isn’t able to look ahead and identify which skills will be most important for your children’s children’s children’s fitness. Second, human intelligence is incredibly good for the fitness of groups of humans, but evolution acts on individual humans for whom the effect size is much smaller (who barely benefit at all from passing knowledge on to the next generation).

It seems like most of your response is an objection to this framing. I may need to think more about the relative advantages and disadvantages of each framing, but I don't think either is outright wrong.

What does "useful" mean here? If by "useful" you mean "improves an individual's reproductive fitness", then I disagree with the claim and I think that's where the major disagreement is.

Yes, I meant useful for reproductive fitness. Sorry for ambiguity.

[-]Rohin Shah5y20

I may need to think more about the relative advantages and disadvantages of each framing, but I don't think either is outright wrong.

I agree it's not wrong. I'm claiming it's not a useful framing. If we must use this framing, I think humans and evolution are not remotely comparable on how good they are at long-term optimization, and I can't understand why you think they are. (Humans may not be good at long-term optimization on some absolute scale, but they're a hell of a lot better than evolution.)

I think in my example you could make a similar argument: looking at outcomes, you could say "Rohin is always optimizing for learning abstract algebra, and he has now become very good at abstract algebra." It's not wrong, it's just not useful for predicting my future behavior, and doesn't seem to carve reality at its joints.

(Tbc, I think this example is overstating the case, "evolution is always optimizing for fitness" is definitely more reasonable and more predictive than "Rohin is always optimizing for learning abstract algebra".)

I really do think that the best thing is to just strip away agency, and talk about selection:

the argument is that evolution was not selecting for proto-culture / intelligence, whereas humans will select for proto-culture / intelligence

Re: usefulness:

Yes, I meant useful for reproductive fitness.

Suppose a specific monkey has some mutation and gets a little bit of proto-culture. Are you claiming that this will increase the number of children that monkey has?

[-]Daniel Kokotajlo5y40

I'm confused about why you only updated mildly away from slow takeoff. It seems that you've got a pretty good argument against slow takeoff here:

Are there simple changes to chimps (or other animals) that would make them much better at accumulating culture?

Will humans continually pursue all simple yet powerful changes to our AIs?

Seems like if the answer to the first question is No, then there really is some relatively sharp transition to much more powerful culture-accumulating capabilities, that humans crossed when they evolved from chimp-like creatures. Thus, our default assumption should be that as we train bigger and bigger neural nets on more and more data, there will also be some relatively sharp transition. In other words, Yudkowsky's argument is correct.

Seems like if the answer to the second question is No, then Paul's disanalogy between evolution and AI researchers is also wrong; both evolution and AI researchers are shoddy optimizers that sometimes miss things etc. So Yudkowsky's argument is correct.

Now, you put 50% on the first answer being No and 70% on the second answer being No. So shouldn't you have something like 85% credence that Paul is wrong and Yudkowsky's argument is correct? And isn't that a fairly big update against slow takeoff?

Maybe the idea is that you are meta-uncertain, unsure you are reasoning about this correctly, etc.? Or maybe the idea is that Yudkowsky's argument could easily be wrong for other reasons than the ones Paul gave? Fair enough.

[-]Rohin Shah5y50

I'm confused. I thought the intended argument is "Yes, there are simple changes to chimps that make them much better at accumulating culture; similarly we should expect there to be simple changes to neural nets that much improve their capabilities, and so just as humans had a 'fast takeoff' so too will neural nets".

This implies that a "Yes" to Q1 supports fast takeoff. And I tend to agree with this -- if there are only complicated changes that lead to discontinuities, then why expect that we will find them?

(Like, there is some program we can write that would be way, way more intelligent than us. You could think of that as a complicated change. But surely the existence of a superintelligence doesn't tell us much about takeoff speeds.)

I also interpreted Richard as arguing that a "Yes" to Q1 would support fast takeoff, though I found it hard to follow the reasoning on how Q1 and Q2 relate to takeoff speeds (will write a top-level comment about this after this one).

[-]Daniel Kokotajlo5y50

Very good point; now I am confused. I think tentatively that Richard was too quick to make " Are there simple changes to chimps (or other animals) that would make them much better at accumulating culture?" the crux on which "human progress would have been much less abrupt if evolution had been optimising for cultural ability all along" depends.

[-]Richard_Ngo5y30

So my reasoning is something like:

There's the high-level argument that AIs will recursively self-improve very fast.
There's support for this argument from the example of humans.
There's a rebuttal to that support from the concept of changing selection pressures.
There's a counterrebuttal to changing selection pressures from my post.

By the time we reach the fourth level down, there's not that much scope for updates on the original claim, because at each level we lose confidence that we're arguing about the right thing, and also we've zoomed in enough that we're ignoring most of the relevant considerations.

I'll make this more explicit.

Moderation Log

How easily could animals evolve culture?

Let’s distinguish between three sets of skills which contribute to human intelligence: general cognitive skills (e.g. memory, abstraction, and so on); social skills (e.g. recognising faces, interpreting others’ emotions); and cultural skills (e.g. language, imitation, and teaching). I expect Paul to agree with me that chimps have pretty good general cognitive skills, and pretty good social skills, but they seriously lack the cultural skills that precipitated the human “fast takeoff”. In particular, there’s a conspicuous lack of proto-languages in all nonhuman animals, including some (like parrots) which have no physiological difficulties in forming words. Yet humans were able to acquire advanced cultural skills relatively quickly after diverging from chimpanzees. So why haven’t nonhuman animals, particularly chimpanzees, developed cultural skills that are anywhere near as advanced as ours? Here are three possible explanations:

Advanced cultural skills are not very useful for species with sub-human levels of general cognitive skills and social skills.
Advanced cultural skills are not directly selected for in species with sub-human levels of general cognitive skills and social skills.
Advanced cultural skills are too complex for species with sub-human levels of general cognitive skills and social skills to acquire.

I’ve assigned 40%, 45% and 15% credence respectively to each of these being the most important explanation for the lack of cultural skills in other species, although again these are very very rough estimates.

What reasons do we have to believe or disbelieve in each? The first one is consistent with Lewis and Laland’s experiments, which suggest that the usefulness of culture increases exponentially with fidelity of cultural transmission. For example, moving from a 90% chance to a 95% chance of copying a skill correctly doubles the expected length of any given transmission chain, allowing much faster cultural accumulation. This suggests that there’s a naturally abrupt increase in the usefulness of culture as species gain other skills (such as general cognitive skills and social skills) which decrease their error rate. As an alternative possibility, Dunbar’s work on human evolution suggests that increases in our brain size were driven by the need to handle larger social groups. It seems plausible that culture becomes much more useful when interacting with a bigger group. Either of these hypotheses supports the idea that AI capabilities might quickly increase.

The second possibility is the most consistent with the changing selection pressures argument.^[1] The core issue is that culture requires the involvement of several parties - for example, language isn’t useful without both a speaker and a listener. This makes it harder for evolution to select for advanced language use, since it primarily operates on an individual level. Consider also the problem of trust: what prevents speakers from deceiving listeners? Or, if the information is honest and useful, what ensures that listeners will reciprocate later? These problems might significantly reduce the short-term selection for cultural skills. However, it seems to me that many altruistic behaviours have overcome these barriers, for example by starting within kin groups and spreading from there. In Darwin’s Unfinished Symphony, Laland hypothesises that language started the same way. It seems hard to reconcile observations of altruistic behaviour in chimps and other animals with the claim that proto-culture would have been even more useful, but failed to emerge. However, I've given this possibility relatively high credence anyway because if I imagine putting chimps through strong artificial selection for a few thousand years, it seems pretty plausible that they could acquire useful cultural skills. (Although see the next section for why this might not be the most useful analogy.)

The third possibility is the trickiest to evaluate, because it’s hard to reason about the complexity of cognitive skills. For example, is the recursive syntax of language something that humans needed complex adaptation to acquire, or does it reflect our pre-existing thought patterns? One skill that does seem very sophisticated is the ability of human infants to acquire language - if this relied on previous selection for general cognitive skills, then it might have been very difficult for chimps to acquire. This possibility implies that developing strong non-cultural skills makes it much easier to develop cultural skills. This would also be evidence in favour of fast takeoffs, since it means that even if humans are always trying to build increasingly useful AIs, our ability to add some important skills might advance rapidly once our AIs possess other prerequisite skills.

How well can humans avoid comparable oversights?

Even assuming that evolution did miss something simple and important for a long time, though, the changing selection pressures argument fails if humans are likely to also spend a long time overlooking some simple way to make our AIs much more useful. This could be because nobody thinks of it, or merely because the idea is dismissed by the academic mainstream. See, for example, the way that the field of AI dismissed the potential of neural networks after Minsky and Papert’s Perceptrons was released. And there are comparably large oversights in many other scientific domains. When we think about how easy it would be for AI researchers to do better than evolution, we should be asking: “would we have predicted huge fitness gains from cultural learning in chimpanzees, before we’d ever seen any examples of cultural learning?” I suspect not.^[2]

Paul would likely respond by pointing to AI Impacts’ evidence that discontinuities are rare in other technological domains - suggesting that, even when fields have been overlooking big ideas, their discovery rarely cashes out in sharp changes to important metrics.^[3] But I think there is an important disanalogy between AI and other technologies: modern machine learning systems are mostly “designed” by their optimisers, with human insights only contributing at a high level. This has three important implications.

Firstly, it means that attempts to predict discontinuities should consider growth in compute as well as intellectual progress. Exactly how we do so depends on whether compute and insights are better modeled as substitutes or complements to each other - that is, whether insights have less or more impact when more compute becomes available. If they’re substitutes, then we should expect continuous compute growth to “smooth out” the lumpiness in human insight. But if they’re complements, then compute growth exacerbates that lumpiness - an insight which would have led to a big jump with a certain amount of compute available could lead to a much bigger jump if it’s only discovered when there’s much more compute available.

I think there’s much more to be said on this question, which I’m currently very uncertain about. My best guess is that we used to be in a regime where compute and insight were substitutes, because domain-specific knowledge played a large role. But now that researchers are taking the bitter lesson more seriously, and working on tasks where it’s harder to encode domain-specific knowledge, it seems more plausible that we’re in a complementary regime, where insights are mainly used to leverage compute rather than replace it.

Either way, this argument suggests that the comparison to other technological domains in general is a little misleading. Instead, we should look at fields in which an important underlying resource was becoming exponentially cheaper - for instance, fields which rely on DNA sequencing. One could perhaps argue that all scientific fields depend on the economy as a whole, which is growing exponentially - but I’d be more convinced by examples in which the dependency is direct, as it is in ML.

Secondly, our reliance on optimisers means that we don’t understand the low-level design details of neural networks as well as we understand the low-level design details in other domains. Not only are the parameters of our neural networks largely opaque to us, we also don’t have a good understanding of what our optimisers are doing when they update those parameters. This makes it more likely that we miss an important high-level insight, since our high-level intuitions aren’t very well linked to whatever low-level features make our neural networks actually function.

Thirdly, even if we can identify all the relevant traits that we’d like to aim for at a high level, we may be unable to specify them to our optimisers, for all the reasons explained in the AI safety literature. That is, by default we should expect our optimisers to develop AIs with capabilities that aren't quite what we wanted (which I'll call capabilities misspecification). Perhaps that comes about because it’s hard to provide high-quality feedback, or hard to set up the right environments, or hard to make multiple AIs interact with each other in the right way (I discuss such possibilities in more depth in this post). If so, then our optimisers might make the same types of mistakes as evolution did, for many of the same reasons. For example, it’s not implausible to me that we build AGI by optimising for the most easily measurable tasks that seem to require high intelligence, and hoping that these skills generalise - as was the case with GPT-3. But in that case the fact that humans are “aiming towards” useful AIs doesn’t help very much in preventing discontinuities.

Paul claims that, even if this argument applies at the level of individual optimisers, it hasn't previously been relevant at the level of the ML community as a whole. This seems plausible, but note that the same could be said for alignment problems in general. So far they've only occurred in isolated contexts, yet many of us expect that alignment problems will get more serious as we build more sophisticated systems that generalise widely in ways we don't understand very well. So I'm inclined to believe that capabilities misspecification will also be more of a problem in the future, for roughly the same reasons. One could also argue against the likelihood of capabilities misspecification by postulating that in order to build AGIs we’ll only need to optimise them to achieve relatively straightforward tasks in relatively simple environments. In practice, though, it’s difficult to make such arguments compelling given the uncertainties involved.^[4]

Overall, I think that the changing selection pressures argument is a plausible consideration, but far from fully convincing; and that evaluating it thoroughly will require much more scrutiny. However, I'd be more excited about future work which classifies both Paul and Eliezer's positions as "fast takeoff", and then evaluates those against the view that AGI will "merely" bump us up to a steeper exponential growth curve - e.g. as defended by Hanson.

As further support for this argument it’d be nice to have more examples of cases where evolution plausibly missed an important leap, in addition to the development of human intelligence. Are there other big evolutionary discontinuities? Plausibly multicellularity and the Cambrian explosion qualify. On a smaller scale, two striking types of biological discontinuities (for which I credit Amanda Askell and Beth Barnes) are invasive species, and runaway sexual selection. But in both cases I think this is more reasonably described as a change in the objective, rather than a species quickly getting much fitter within a given environment.
In practice we can take inspiration from humans in order to figure out which traits will be necessary in AGIs - we don’t need to invent all the ideas from scratch. But on the other hand, even given the example of humans, we haven’t made much progress in understanding how or why our intelligence works, which suggests that we’re reasonably likely to overlook some high-level insights.
One natural reason to think that economic usefulness of AIs will be relatively continuous even if we overlook big insights is that humans can fill in gaps in the missing capabilities of our AIs, so that they can provide a lot of value without being good at every aspect of a given job. By contrast, in nature each organism has to be very well-rounded in order to survive.
Perhaps the strongest hypothesis along these lines is that language is the key ingredient - yet it seems like language models will become data-constrained relatively soon.