Values Form a Shifting Landscape (and why you might care)

VojtaKovarik

Huge thanks to Veronika Žolnerčíková for drawing the pictures for this post. I am also grateful to the people who provided feedback on various versions of the text.

Comment on the epistemic status of this text: The model described here is oversimplified, but I think it nevertheless captures an important dynamic that will reemerge in more rigorous descriptions of the topic.

The Shifting Landscape of Values

In this post, I would like to talk about the relations between different value systems of different people, cultures, aliens, AIs, etc. By a “value system”, I will mean the collection of all preferences (stated and revealed), the collection of decision rules, utility function, or whatever else the given entity uses to make decisions. Rather than focusing on what exactly value systems are and how they are implemented, I would like to discuss how the different value systems view each other, what this implies for how preferences change over time, and what can we do to have better conversation on the topic.

We can view the set of all possible value-systems as some kind of an abstract space that contains points like “Alice’s value-system”, “Bob’s value-system”, “value-system of a particular alien or AI”, with “more similar” value-systems being closer together. With some eye-squinting, we can include points like “the system of values advocated for by the catholic church”, “the system of values that hunter-gatherers might have had”, and other hypothetical value-systems that might not necessarily correspond to any specific entity (Figure 1).

Figure 1: The space of different value-systems (including hypothetical value-systems).

Rather than being an impartial observer, I currently have some system of values and I have an opinion on the other systems. As a simple model, let’s say that I assign a real number to each value-system based on how desirable it seems to me. The interpretation is that if I had to choose which of two value-systems to adopt and stick with (via a magic button), I would pick the one with the higher corresponding number (Figure 2). As we can see on Figure 2, others might have a different opinion on how this "value landscape" looks like.

Figure 2: Desirability of different value-systems, from the point of view of different entities. On the "x-axis", we see the space of different value systems (should be multi-dimensional instead of a line but this would be impractical to draw). The y-axis indicates desirability. Each curve corresponds to the preferences of the value-system to which it is connected by the cross and vertical line --- it denotes how much would an entity with that value system like to have different values instead.

It might not always be the case that the most preferable values are the ones a person currently possesses. For example, I might wish to be vegan, more emphatic, or what not, but lack the means to make the change.

Figure 3: We might value other values higher than our own (without automatically adopting them). The axes are as in Figure 1.

Apart from the “bias against dissimilar values”, the landscape might also be shaped by other factors (an anecdotal example being that of catholics hating protestants more than atheists), particularly for value-systems that work differently from those of present-day humans. As a consequence, I propose that we can imagine the space of value-systems as a “shifting landscape” that changes shape as we move through it.

Figure 4: The space of value systems as a shifting landscape (the vertical line & cross connects each value-system to its “opinion” on other systems). The axes are as in Figure 1.

I assume quite a few people already have a somewhat similar image in their head. However, I believe that if more people had explicit access to this mental model, we might start referring to it in conversations, and thus increase the efficiency of discussions that involve the change of values over time. In the remainder of the post, I give several examples of why this might be the case.

Applications of the Model

Some of the things we can do with the above-described are:

1) We can talk about “landscape features” in the value landscape and about different ways in which values can shift. For example, for any shift between values A and B, the first thing we can ask is whether A and B “agree” on the valence of the shift. We can also talk about “value drift” - a series of changes that seem neutral locally but very significant when viewed as a whole. Finally, an important landscape-feature that we can focus on are stable points (where all local changes seem like a downgrade) and attractors (stable points to which all points from a larger area are drawn).

Figure 5: Examples of value shift (from "green" to "blue" values) where the old value system and the new one agree, resp. disagree, on the valence of the shift. Left: carnivore Leela would prefer to switch to being a vegan, and would retrospectively endorse the change. Right: carnivore Morty would also prefer being a vegan, but would regret the change afterwards. The axes are as in Figure 1.

Figure 6: Value drift, where many local changes to which we are indifferent compound into a major change. (Depicted is a single person across different parts of their life. However, similar dynamics could arise on long-term civilizational scales.) The axes are as in Fig. 1.

Figure 7: A stable point in the value landscape, to which values from some neighbourhood converge. The axes are as in Figure 1.

2) We can be explicit about whether a given issue can be discussed in this simple model, or whether it requires further refinement. For, example, if we want to reason about how abolishing slavery differs from brainwashing (or from the transition from the honor-based pre-agricultural society to the present one), we need to go beyond the model. But the model suffices for describing their similarity.

3) The model can provide a shared language for expressing the core of a potential disagreement. For example, we can ask “Are there even any stable points that seem like improvements from our current point of view, or is any change necessarily a moral dilemma?”. Two people might agree that both of these options might be true about some part of the value landscape, yet disagree about the actual shape of the landscape around us.

4) Finally, the model can help us communicate desiderata that different people might have. For example, we might wish to find a stable point that is as valuable from our current point of view as possible, or say that our current values are irrelevant and instead look for a stable point that values itself the highest, or find some compromise between these two.

Figure 8: Using the model to illustrate desiderata about the future of our values. (The particular example illustrates the difference between what seems optimal now vs what will seem optimal and be stable once we get there.)

_________

[-]Gordon Seidoh Worley3y20

I like that this post is fairly accessible, although I found the charts confusing, largely because it's not always that clear to me what's being measured on each axis. I basically get what's going on, but I find myself disliking something about way the charts are presented because it's not always very clear what each axis measures.

(In some cases I think of them as more like being multidimensional spaces you've put on a line, but that still makes the visuals kind of confusing.)

None of this is really meant to be a big complaint, though. Graphics are hard; I probably wouldn't have even tried to illustrate it, so kudos to you for trying. Just felt it was also useful to register my feedback that they didn't quite land for me even though I got the gist of them.

[-]Vojtech Kovarik3y10

Thank you for the comment. As for the axes, the y-axis always denotes the desirability of the given value-system (except for Figure 1). And you are exactly right with the x-axis --- that is a multidimensional space of value-systems that we put on a line, because drawing this in 3D (well, (multi+1)-D :-) ) would be a mess. I will see if I can make it somewhat clearer in the post.

AI ALIGNMENT FORUM
AF

Values Form a Shifting Landscape (and why you might care)

16

The Shifting Landscape of Values

Applications of the Model

16