"Go west, young man!" - Preferences in (imperfect) maps

Planned summary for the Alignment Newsletter:

This post argues that by default, human preferences are strong views built upon poorly defined concepts, that may not have any coherent extrapolation in new situations. To put it another way, humans build mental maps of the world, and their preferences are defined on those maps, and so in new situations where the map no longer reflects the world accurately, it is unclear how preferences should be extended. As a result, anyone interested in preference learning should find some incoherent moral intuition that other people hold, and figure out how to make it coherent, as practice for the case we will face where our own values will be incoherent in the face of new situations.

Planned opinion:

This seems right to me -- we can also see this by looking at the various paradoxes found in the philosophy of ethics, which involve taking everyday moral intuitions and finding extreme situations in which they conflict, and it is unclear which moral intuition should “win”.

Reversing the purpose of maps

We generally see maps as working the other way round: as tools to that serve the purposes of our "real" goals. Eliezer writes about how, if definitions didn't stand for some query, something relevant to our "real" preferences, we'd have no reason to care about them.

But if, as I've argued, most of our preferences live in our mental maps, then changing definitions or improving maps can tear up our preferences and values - or at least force us to re-assess them.

Defending "purity"

This is why I spend so much time thinking about "conservative" values, especially those around the moral foundation of purity. I mainly don't share that moral foundation, so it's clear to me how incoherent it is. It's painful to listen to someone who has that moral foundation, twist and turn and try to justify it based on more consequentialist reasoning. Yes, rituals can bind a community together; but are you really telling me that if, say, TV shows or facebook games were shown to do a better binding job, you'd cheerfully discard those rituals?

But I strongly suspect that, ultimately, the moral foundations I do care about, such as care/harm, as also incoherent when we push too far into unfamiliar territory. So I want to forge something coherent out of purity, as practice for forging something coherent out of all our values.

A metaphorical example

Your parent, on their deathbed, gives you your mission in life: an old map, a compass, and the instructions "Go west, young man^[1]!"

The map is... incomplete:

The compass is fine, but, as we know, its concept of west is not exactly the same as the standard geographical one.

In the era and place that your hypothetical parent was from, the connotations of "going west" involve adventure and potential richness.

And, most importantly, neither of you have yet realised that the world is round.

So, for a short while, "going west" seems like a clear, well-defined goal. But as we get to the edge of the map, both literally and metaphorically, the concept starts to lose definition and become far more uncertain; and hence, so does your goal.

What will you do with your goal when your mental maps are forced to change?

Don't worry if you're not actually a young man; their mind was starting to go, towards the end. ↩︎

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

12

"Go west, young man!" - Preferences in (imperfect) maps

12

Reversing the purpose of maps

Defending "purity"

A metaphorical example