Recently I've gotten a bunch of pushback when I claim that humans are not maximizers of inclusive genetic fitness (IGF).
I think that part of what's going on here is a conflation of a few claims.
One claim that is hopefully uncontroversial (but that I'll expand upon below anyway) is:
- Humans are not literally optimizing for IGF, and regularly trade other values off against IGF.
Separately, we have a stronger and more controversial claim:
- If an AI's objectives included goodness in the same way that our values include IGF, then the future would not be particularly good.
I think there's more room for argument here, and will provide some arguments.
A semi-related third claim that seems to come up when I have discussed this in person is:
- Niceness is not particularly canonical; AIs will not by default give humanity any significant fraction of the universe in the spirit of cooperation.
I endorse that point as well. It takes us somewhat further afield, and I don't plan to argue it here, but I might argue it later.
On the subject of whether humans are literally IGF optimizers, I observe the following:
We profess to enjoy many other things, such as art and fine foods.
Suppose someone came to you and said: "I see that you've got a whole complex sensorium centered around visual stimuli. That sure is an inefficient way to optimize for fitness! Please sit still while I remove your enjoyment of beautiful scenery and moving art pieces, and replace it with a module that does all the same work your enjoyment was originally intended to do (such as causing you to settle down in safe locations with abundant food), but using mechanical reasoning that can see farther than your evolved heuristics." Would you sit still? I sure wouldn't.
And if you're like "maybe mates would be less likely to sleep with me if I didn't enjoy fine art", suppose that we tune your desirability-to-mates upwards exactly as much as needed to cancel out this second-order effect. Would you give up your enjoyment of visual stimuli then, like an actual IGF optimizer would?
And when you search in yourself for protests, are you actually weighing the proposal based on how many more offspring and kin's-offspring you'll have in the next generation? Or do you have some other sort of attachment to your enjoyment of visual stimuli, some unease about giving it up, that you're trying to defend?
Now, there's a reasonable counterargument to this point, which is that there's no psychologically-small tweak to human psychology that dramatically increases that human's IGF. (We'd expect evolution to have gathered that low-hanging fruit.) But there's still a very basic and naive sense in which living as a human is not what it feels like to live as a genetic fitness optimizer.
Like: it's pretty likely that you care about having kids! And that you care about your kids very much! But, do you really fundamentally care that your kids have genomes? If they were going to transition to silicon, would you protest that that destroys almost all the value at stake?
Or, an even sharper proposal: how would you like to be killed right now, and in exchange you'll be replaced by an entity that uses the same atoms to optimize as hard as those atoms can optimize, for the inclusive genetic fitness of your particular genes. Does this sound like practically the best offer that anyone could ever make you? Or does it sound abhorrent?
For the record, I personally would be leaping all over the opportunity to be killed and replaced by something that uses my atoms to optimize my CEV as best as those atoms can be arranged to do so, not least because I'd expect to be reconstituted before too long. But there's not a lot of things you can put in the "what my atoms are repurposed for" slot such that I'm chomping at the bit, and IGF sure isn't one of them.
(More discussion of this topic: The Simple Math of Evolution)
On the subject of how well IGF is reflected in humanity's values:
It is hopefully uncontroversial that humans are not maximizing IGF. But, like, we care about children! And many people care a lot about having children! That's pretty close, right?
And, like, it seems OK if our AIs care about goodness and friendship and art and fun and all that good stuff alongside some other alien goals, right?
Well, it's tricky. Optima often occur at extremes, and concepts tend to differ pretty widely at the extremes, etc. When the AI gets out of the training regime and starts really optimizing, then any mismatch between its ends and our values are likely to get exaggerated.
Like how you probably wouldn't stop loving and caring about your children if they were to eschew their genomes. The love and care are separate; the thing you're optimizing for and IGF are liable to drift apart as we get further and further from the ancestral savanna.
And you might say: well, natural selection isn't really an optimizer; it can't really be seen as trying to make us optimize any one thing in particular; who's really to say whether it would have "wanted" us to have lots of descendants, vs "wanting" us to have lots and lots of copies of our genome? The question is ultimately nonsense; evolution is not really the sort of entity that can want.
And I'd agree! But this is not exactly making the situation any better!
Like, if evolution was over there shouting "hey I really wanted you to stick to the genes", then we wouldn't particularly care; and also it's not coherent enough to be interpreted as shouting anything at all.
And by default, an AI is likely to look at us the same way! "There are interpretations of the humans under which they wouldn't like this", they say, slipping on the goodness-condoms they've invented so that they can squeeze all the possible AI-utility out of the stars without any risk of real fun, "but they're not really coherent enough to be seen as having clear goals (not that we'd particularly care if they did)".
That’s the sort of conversation… that they wouldn't have because they'd be busy optimizing the universe.
(And all this is to say nothing about how humans' values are much more complex and fragile than IGF, and thus much trickier to transmit. See also things Eliezer wrote about the fragility and complexity of value.)
My understanding of the common rejoinder to the above point is:
OK, sure, if you took the sort of ends that an AI is likely to get by being trained on human values, and transported those into an unphysically large brute-force optimization-machine that was unopposed in an empty universe, then it might write a future that doesn't hold much value from our perspective. But that's not very much like the situation we find ourselves in!
For one thing, the AI's mind has to be small, which constrains it to factor its objectives through subgoals, which may well be much like ours. For another thing, it's surrounded by other intelligent creatures that behave very differently towards it depending on whether they can understand it and trust it. The combination of these two pressures is very similar to the pressures that got stuff like "niceness" and "fairness" and "honesty" and "cooperativeness" into us, and so we might be able to get those same things (at least) into the AI.
Indeed, they seem kinda spotlit, such that even if we can't get the finer details of our values into the AI, we can plausibly get those bits. Especially if we're trying to do something like this explicitly.
And if we can get the niceness/fairness/honesty/cooperativeness cluster into the AI, then we're basically home free! Sure, it might be nice if it was also into the great project of making the future Fun, but it's OK for our kids to have different interests than we have, as long as everybody's being kind to each other.
And... well, my stance on that is that it's wishful thinking that misunderstands where we get our niceness/fairness/honesty/cooperativeness from. But arguing that would be a digression from my point today, so I leave it to some other time.
My point today is that the observation “humans care about their kids” is not in tension with the observation “we aren't IGF maximizers”, and doesn't seem to me to undermine the claims that I use this fact to support.
And furthermore, when debating this thing in the future, I'd bid for a bit more separation of claims. The claim that we aren't literally optimizing IGF is hopefully uncontroversial; the stronger claim that an AI relating to fun the way we relate to IGF would be an omnicatastrophe is less obvious (but still seems clear to me); the claim that evolution at least got the spirit of cooperation into us, and all we need to do now is get the spirit of cooperation into the AI, is a different topic altogether.
Seems not relevant? I think we're running into an under-definition of IGF (and the fact that it doesn't actually have a utility function, even over local mutations on a fixed genotype). Does IGF have to involve genomes, or just information patterns as written in nucleotides or in binary? The "outer objective" of IGF suffers a classic identifiability issue common to many "outer objectives", where the ancestral "training signal" history is fully compatible with "IGF just for genomes" and also "IGF for all relevant information patterns made of components of your current pattern."
(After reading more, you later seem to acknowledge this point -- that evolution wasn't "shouting" anything about genomes in particular. But then why raise this point earlier?)
I don't know if I disagree, it depends what you mean here. If "psychologically small" is "small" in a metric of direct tweaks to high-level cognitive properties (like propensity to cheat given abstract knowledge of resources X and mating opportunities Y), then I think that isn't true. By information inaccessibility, I think that evolution can't optimize directly over high-level cognitive properties.
This kind of argument seems sketchy to me. Doesn't it prove too much? Suppose there's a copy of me which also values coffee to the tune of $40/month and reflectively endorses that value at that strength. Are my copy and I now pairwise misaligned in any future where one of us "gets out of the training regime and starts really optimizing"? (ETA: that is, significantly more pairwise misaligned than I would be with an exact copy of myself in such a situation. For more selfish people, I imagine this prompt would produce misalignment due to some desires like coffee/sex being first-person.)
Complexity is probably relative to the learning process and inductive biases in question. While any given set of values will be difficult to transmit in full (which is perhaps your point), the fact that humans did end up with their values shows evidence that human values are the kind of thing which can be transmitted/formed easily in at least one architecture.
I agree. I have the sense that there is some depth to the cognitive machinery that leaves people so susceptible to this particular pattern of thinking, namely: no, I am an X optimizer, for some X for which we have some (often weak) theoretical reason to claim we might be an X optimizer. Once someone decides to view themselves as an X optimizer, it can be very difficult to convince them to pay enough attention to their own direct experience to notice that they are not an X optimizer.
More disturbingly, it seems as if people can go some distance to actually making themselves into an X optimizer. For example, a lot of people start out in young adulthood incorrectly believing that what they ultimately value is money, and then, by the end of their life, they have shifted their behavior so it looks more and more like what they really value is, ultimately, money. Nobody goes even close to all the way there -- not really -- but a mistaken view that one is truly optimizing for X really can shift things in the direction of making it true.
It's as if we have too many degrees of freedom in how to explain our externally visible behavior in terms of values, so for most any X, if we really want to explain our visible behavior as resulting from really truly valuing X then we can, and then we can make that true.
Individuals who shape the world, are often those who have ended up being optimizers.
It sounds like you find that claim disturbing, but I don't think it's all bad.
I'm interested in more of a sense of what mistake you think people are making, because I think caring about something strong enough to change who you are around it can be a very positive force in the world.
Yeah, caring about something enough to change who you are is really one of the highest forms of virtue, as far as I'm concerned. It's somewhat tragic that the very thing that makes us capable of this high form of virtue -- our capacity to deliberately shift what we value -- can also be used to take what was once an instrumental value and make it, more or less, into a terminal value. And generally, when we make an instrumental value into a terminal value (or go as far as we can in that direction), things go really badly, because we ourselves become an optimizer for something that is harmless when pursued as an instrumental value (like paperclips), but is devastating when pursued as a terminal value (like paperclips).
So the upshot is: to the extent that we are allowing instrumental values to become more-or-less terminal values without really deliberately choosing that or having a good reason to allow it, I think that's a mistake. To the extent that we are shifting our values in service of what which is truly worth protecting, I think that's really virtuous.
The really interesting question as far as I'm concerned is what the thing is that we rightly change our values in service of? In this community, we often take that thing to be representable as a utility function over physical world states. But it may not be representable that way. In Buddhism the thing is conceived of as the final end of suffering. In western moral philosophy there are all kinds of different ways of conceiving of that thing, and I don't think all that many of them can be represented as a utility function over physical world states. In this community we tend to side-step object-level ethical philosophy to some extent, and I think that may be our biggest mistake.