If you go back far enough, the ancestors of sharks and dolphins look really different:
But modern day sharks and dolphins have very similar body shapes:
This is a case of convergent evolution: the process by which organisms with different origins develop similar features. Both sharks and dolphins needed speed and energy efficiency when moving in an environment governed by the laws of hydrodynamics, and so they converged on a pretty similar body shape.
For us, this isn’t very surprising, and doesn’t require much knowledge of evolution: we have a good intuitive understanding of how water works, and humans knew a lot of the underlying maths for the laws of hydrodynamics before they understood anything about evolution. Starting from these laws, it isn’t very surprising that sharks and dolphins ended up looking similar.
But what if instead of starting with knowledge of hydrodynamics and then using that to explain the body shape of sharks and dolphins, we started with only knowledge of sharks’ and dolphins’ body shape, and tried to use that to explain underlying laws?
Let’s pretend we’re alien scientists from an alternative universe, and for some weird reason we only have access to simplified 3D digital models of animals and some evolutionary history, but nothing about the laws of physics in the human/shark/dolphin universe. My guess is that these alien scientists would probably be able to uncover a decent amount of physics and a fair bit about the earth’s environment, just by looking at cases of convergent evolution.
If I’m right about this guess, then this could be pretty good news for alignment research. When it comes to thinking about AI, we’re much closer to the epistemic position of the alien scientist: we either don't know the ‘physics’ of life and intelligence at all, or are only just in the process of uncovering it.
But cases of convergent evolution might help us to deduce deep selection pressures which apply to AI systems as well as biological ones. And if they do, we might be able to say more about what future AI systems might look like, or, if we are lucky, even use some of the selection pressures to shape what systems we get.
This post argues that we should use cases of convergent evolution to look for deep selection pressures which extend to advanced AI systems.
Convergent evolution is a potentially big deal for AI alignment work:
In this post, I’ll:
The body shape of sharks and dolphins is just one of very many examples of convergent evolution in biology. For example:
We can think about convergent evolution in terms of:
The basin of convergent evolution is the region of the abstract space in which, once an organism enters the basin, the pull of the selection pressure brings the organism closer to the attractor state.
In the case of sharks and dolphins:
There are some important nuances here.
Firstly, if you back out far enough, cases of convergent evolution are always contingent on something.
Contingent evolution is the process by which organisms develop different traits under the same conditions, because of contingent factors (like random mutations or interspecies encounters). At first, convergent and contingent evolution sound like opposites, but actually they are fractal: every instance of convergent evolution is contingent on some higher level thing. To take our shark and dolphin example, their body shape is contingent on them both being vertebrates. Invertebrates under the same conditions don’t develop that sort of body shape.
Another way of putting this point would be that organisms have to enter the basin for the selection pressures to apply. Different factors determine entry, including both features of the environment and features of the organism. Entry into the basin of convergent evolution which dolphins and sharks both fell into seems to require vertebrae, among other things.
Secondly, similarity/generality do not necessarily imply convergence. Many animals have hooves, but they all share a common ancestor. This is a case of homology, not convergent evolution. The fact that hooved animals are quite widespread shows us that hooves are not maladaptive - but we don’t get the kind of strong signal we would from convergent evolution that hooves are uniquely adaptive. To say that X is convergent, you need to be able to point to multiple different origins converging to points close to X. It’s not enough to just observe that there’s a lot of X around.
Both of these nuances limit and clarify the concept of convergent evolution. Convergent evolution is limited in that there are many common phenomena which it can’t explain (like hooves). But it’s also unusually predictive: provided you understand the scope of the basin of convergent evolution (or in other words, can back out accurately what the convergent evolution is contingent on), then within that basin there’s not much room for things to go otherwise than fall towards the attractor state.
That’s a substantive proviso though: it can be very tricky to back out the contingencies, so often there will be uncertainty about exactly where the selection pressures apply.
Firstly, cases of convergent evolution might point to deep selection pressures which help us predict what advanced AI will be like.
There is some work of this type already in the alignment space, but we think it’s a promising area for further exploration.
There are at least a few different ways of exploring this idea, and probably others we haven’t thought of yet:
Here are some examples where I think that convergent biological evolution points to some deep selection pressures, which are likely to also be relevant for understanding advanced AI systems. We will go into more detail and unpack implications in followup posts.
Epistemic status: this subsection is highly speculative, more than the others.
There are multiple other possibly relevant examples we decided not to include in this post, but we recommend thinking about it for yourself and posting further examples as comments.
Secondly, in my view, lots of existing alignment research implicitly or explicitly relies on convergence.
It seems plausible that for some of the properties people in the alignment space assume are convergent, the relevant basin actually doesn’t extend to advanced AI, or the specific selection pressures are just one of many, making the attractor states not too deep.
Thinking through convergent evolution makes the reasons why these cases of convergence may be relevant clearer. At the same time, the interplay between convergence and contingency, and the limited extent to which some of these pressures seem to shape living things, may point to some of the basins of convergence not being as universal as assumed, or the selection pressures not being that strong. It would be good to have a more explicit discussion of what these cases of convergence are contingent upon, and how clear it is that advanced AI systems will meet those conditions.
Yes, biology is super different from AI.
Evolution is not ‘smart’ - but over the past few billion years, it has had a lot of compute and has explored a lot. 
And evolution didn’t just explore spaces like ‘body shapes made of flesh’, which aren’t very relevant to AI systems. It also explored spaces like ‘control theory algorithms implementable by biological circuits’ and ‘information processing architectures’. Looking at the properties which were converged upon in spaces like that can hopefully tell us something about the underlying selection pressures.
While details of what biological evolution found are contingent, it seems likely that vast convergences across very different species, or even across very different systems like culture and technology, point to deeper selection pressures which apply to AI systems too.
The ideas in this post are mostly Jan’s. Special thanks to Clem who made substantial contributions especially on the parts about contingency, and plans to write a follow up post on the relevance of contingency to AI alignment research. Thanks also to TJ, Petr Tureček and John Wentworth for comments on a draft. Rose did most of the writing.
Nobu Tamura (http://spinops.blogspot.com), CC BY 3.0, via Wikimedia Commons.
Nobu Tamura (http://spinops.blogspot.com), CC BY 3.0, via Wikimedia Commons.
This way of thinking about convergent evolution is used by evolutionary biologists, e.g. here. There are also other ways of approaching it, most commonly in terms of fitness landscape, where instead of individuals falling down into attractor states, selection pressures push individuals uphill. Conventions depend on the subfield.
Note that the attractor state applies to some feature or features of the organism, but is irrelevant to most others. In the shark and dolphin case, the attractor relates to body shape, but does not affect other features like type of immune cells.
https://commons.wikimedia.org/wiki/File:Local_search_attraction_basins.png , CC BY-SA 3.0 <http://creativecommons.org/licenses/by-sa/3.0/>, via Wikimedia Commons.
See https://www.frontiersin.org/articles/10.3389/fpsyg.2019.02688/full and this on hierarchical agency.
See What multipolar failure looks like.
- You can look at any system as an agent
- A system is more agentic the more that describing it using the intentional stance is useful, relative to other stances.
Some candidates: parasitoidism; a combination of causal reasoning, flexibility, imagination, and prospection.
This paper argues that different pressures operated in different taxa, and that for some taxa social learning was a key selection pressure.
For example, The Evolution of the Sensitive Soul: Learning and the Origins of Consciousness by Simona Ginsburg, Eva Jablonka.
Stochastic gradient descent also isn’t the smartest designer, but with enough compute it’s been able to find the smartest AI systems we have.
In addition to looking at biology, I'd look at human organizations (corporations, governments, organized religions, militaries, etc.) Under what conditions do they evolve towards something like "agency?" What about "intelligence?" Under what conditions do they evolve away from those things?
On the topic thinking about it for yourself and posting further examples as comments...
This is GPT4 thinking about convergent properties, using the post as a prompt and generating 20 plausibly relevant convergences.
In my view a) it broadly got the ideab) the result are in my view in a better taste for understand agents than e.g. what you get from karma-ranked LW frontpage posts about AIs on an average day
Really enjoyed this post, both aesthetically (I like evolution and palaeontology, and obviously AI things!) and as a motivator for some lines of research and thought.
I had a go at one point connecting natural selection with gradient descent which you might find useful depending on your aims.
I also collected some cases of what I think are potentially convergent properties of 'deliberating systems', many of them natural, and others artificial. Maybe you'll find those useful, and I'd love to know to what extent you agree or disagree with the concepts there.
Standardized communication protocols
Language is the most obvious example, but there's plenty of others. E.g. taking different parts of the body as subsystems communicating with each other, one neurotransmitter/hormone often has very similar effects in many parts of the body.
In software, different processes can communicate with each other by passing messages having some well-defined format. When you're sending an API request, you usually have a good idea of what shape the response is going to take and if the request fails, it should fail in a predictable way that can be harmlessly handled. This makes making reliable software easier.
Some cases of standardization are spontaneous/bottom-up, whereas others are engineered top-down. Human language is both. Languages with greater number of users seem to evolve simpler, more standardized grammars, e.g. compare Russian to Czech or English to Icelandic (though syncretism and promiscuous borrowing may also have had an impact in the second case). I don't know if something like that occurs at all in programming languages but one factor that makes it much less likely is the need to maintain backward-compatibility, which is important for programing languages but much weaker for human languages.
Partial convergence between language models and brains and evolutionary analogy