If Evolution was allowed to continue undisturbed, it could, conceivably, one day, produce a pure inclusive genetic fitness maximizer by reworking our base desires. So the path would have been:

First replicator -> first brains -> base desires (roughly first reinforcement learners) -> first unaligned (by Evolution's standards) consequentialists -> aligned consequentialists by reworking base desires -> Evolution cannot optimize further because it reaches the global optimum or at least a very steep local optimum.

Does that mean that with sufficiently long training time and being careful not to produce some agent that gets stuck with an imperfectly aligned goal mid-training, we could also create, artificially, a perfectly aligned (to some goal specified by us) consequentialist?

How likely are these things to happen? Both during Evolution and during training of an ML model?

Thoughts inspired by Thou Art Godshatter.

New Comment
4 comments, sorted by Click to highlight new comments since: Today at 6:47 PM

What does it mean to be "aligned with evolution"? Like, are viruses aligned? Intuitively... more than humans. I mean, they do not think, but the average thing they do, is more likely to be about replication, than the average thing humans do.

On the other hand, if humans one day colonize the galaxy, they will replicate more than a virus (in a hypothetical world without humans) ever could. So maybe, in long term, humans are better at replication than viruses?

I am trying to figure out what is the relation between "alignment with evolution" and "short-term thinking". Like, imagine that some people get hit by magical space rays, which make them fully "aligned with evolution". What exactly would such people do?

Would they try to fuck someone, rather than e.g. do art or study philosophy? But what if the art or the philosophy makes it easier to get laid? So maybe in such case they would do the art/philosophy, but they would feel no intrinsic pleasure from doing it, like it would all be purely instrumental, willing to throw it all away if on second thought they find out that this is actually not maximizing reproduction?

How would they even figure out what is the reproduction-optimal thing to do? Would they spend some time trying to figure out the world? (The time that could otherwise be spent trying to get laid?) Or perhaps, as a result of sufficiently long evolution, they would already do the optimal thing instinctively? (Because those who had the right instincts and followed them, outcompeted those who spent too much time thinking?)

But would that mean that the environment is fixed? Especially, if the most important part of the environment is other people? Maybe the humanity would get locked in an equilibrium where the optimal strategy is found, and everyone who tries doing something else is outcompeted; and afterwards those who do the optimal strategy more instinctively outcompete those who need to figure it out. What would such equilibrium look like?

I am trying to figure out what is the relation between "alignment with evolution" and "short-term thinking". Like, imagine that some people get hit by magical space rays, which make them fully "aligned with evolution". What exactly would such people do?

I think they would become consequentialists smart enough that they could actually act to maximize inclusive genetic fitness. I think Thou Art Godshatter is convincing.

But what if the art or the philosophy makes it easier to get laid? So maybe in such case they would do the art/philosophy, but they would feel no intrinsic pleasure from doing it, like it would all be purely instrumental, willing to throw it all away if on second thought they find out that this is actually not maximizing reproduction?

Yeah that's what I would expect.

How would they even figure out what is the reproduction-optimal thing to do? Would they spend some time trying to figure out the world? (The time that could otherwise be spent trying to get laid?) Or perhaps, as a result of sufficiently long evolution, they would already do the optimal thing instinctively? (Because those who had the right instincts and followed them, outcompeted those who spent too much time thinking?)

I doubt that being governed by instincts can outperform a sufficiently smart agent reasoning from scratch, given sufficiently complicated environment. Instincts are just heuristics after all...

But would that mean that the environment is fixed? Especially, if the most important part of the environment is other people? Maybe the humanity would get locked in an equilibrium where the optimal strategy is found, and everyone who tries doing something else is outcompeted; and afterwards those who do the optimal strategy more instinctively outcompete those who need to figure it out. What would such equilibrium look like?

Ohhh interesting, I have no idea... it seems plausible that it could happen though!

If Evolution was allowed to continue undisturbed

What does it mean to you? I am guessing you mean that humans evolve culturally rather than genetically? Why would you not call it evolution, anyway?

No, I mean "humans continue to evolve genetically, and they never start self-modifying in a way that makes evolution impossible (e.g., by becoming emulations)."