Remmelt Ellen

Research Coordinator of area "Do Not Build Uncontrollable AI" for AI Safety Camp.

See explainer on why AGI could not be controlled enough to stay safe:



Thanks for the thoughts! Some critical questions:

Natural selection requires variation. Information theory tells us that all information is subject to noise and therefore variation across time.

Are you considering variations introduced during learning (as essentially changes to code, that can then be copied). Are you consider variations introduced by microscopic changes to the chemical/structural configurations of the maintained/produced hardware?

However, we can reduce error rates to arbitrarily low probabilities using coding schemes.

Claude Shannon showed this to be the case for a single channel of communication. How about when you have many possible routing channels through which physical signals can leak to and back from the environment?

If you look at existing networked system architectures, does the near-zero error rates you can correct toward at the binary level (eg. with use of CRC code) also apply at higher layers of abstraction (eg. in detecting possible trojan horse adversarial attacks)?

If there is no variation then there is no natural selection.

This is true. Can there be no variation introduced into AGI, when they are self-learning code and self-maintaining hardware in ways that continue to be adaptive to changes within a more complex environment?

In abstract terms, evolutionary dynamics require either a smooth adaptive landscape such that incremental changes drive organisms towards adaptive peaks…

Besides point-change mutations, are you taking into account exaptation, as the natural selection for shifts in the expression of previous (learned) functionality? Must exaptation, as involving the reuse of functionality in new ways, involve smooth changes in phenotypic expression?

…and/or unlikely leaps away from local optima into attraction basins of other optima.

Are the other attraction basins instantiated at higher layers of abstraction? Are any other optima approached through selection across the same fine-grained super-dimensional landscape that natural selection is selective across? If not, would natural selection “leak” around those abstraction layers, as not completely being pulled into the attraction basins that are in fact pulling across a greatly reduced set of dimensions? Put a different way, can natural selection pull side-ways on the dimensional pulls of those other attraction basins?

I believe that natural selection requires a population of "agents" competing for resources. If we only had a single AI system then there is no competition and no immediate adaptive pressure.

I get how you would represent it this way, because that’s often how natural selection gets discussed as applying to biological organisms.

It is not quite thorough in terms of describing what can get naturally selected for. For example, within a human body (as an “agent”) there can be natural selection across junk DNA that copies itself across strands, or virus particles, or cancer cells. At that microscopic level though, the term “agent” would lose its meaning if used to describe some molecular strands.

At the macroscopic level of “AGI”, the single vs. multiple agents distinction would break down, for reasons I described here.

Therefore, to thoroughly model this, I would try describe natural selection as occurring across a population of components. Those components would be connected and co-evolving, and can replicate individually (eg. as with viruses replacing other code) or as part of larger packages or symbiotic processes of replication (eg. code with hardware). For AGI, they would all rely on somewhat similar infrastructure (eg. for electricity and material replacement) and also need somewhat similar environmental conditions to operate and reproduce.

Other dynamics will be at play which may drown out natural selection…Other dynamics may be at play that can act against natural selection.

Can the dynamic drown out all possible natural selection over x shortest-length reproduction cycles? Assuming the “AGI” continues to exist, could any dynamics you have in mind drown out any and all interactions between components and surrounding physical contexts that could feed back into their continued/increased existence?

We see existence-proofs of this in immune responses against tumours and cancers. Although these don't work perfectly in the biological world, perhaps an advanced AI could build a type of immune system that effectively prevents individual parts from undergoing runaway self-replication.

Immune system responses were naturally selected for amongst organisms that survived.

Would such responses also be naturally selected for in “advanced AI” such that not the AI but the outside humans survive more? Given that bottom-up natural selection by nature selects for designs across the greatest number of possible physical interactions (is the most comprehensive), can alternate designs built through faster but more narrow top-down engineering actually match or exceed that fine-grained extent of error detection and correction? Even if humans could get “advanced AI” to build in internal error detection and correction mechanisms that are kind of like an immune system, would that outside-imposed immune system withstand natural selection while reducing the host’s rates of survival and reproduction?

~ ~ ~

Curious how you think about those questions. I also passed on your comment to my mentor (Forrest) in case he has any thoughts.