Preference synthesis illustrated: Star Wars

But the "meta-preferences" are a bit more worrying. Are they genuine meta-preferences? Especially since the second one is one that was more subconscious, and the third one looks more like a standard preference than a meta-preference. If the category of meta-preference is not clear, then that part of the research agenda needs to be improved.

I think one of the challenges is that, to me at least, it's still unclear if we really have anything like meta-preferences that behave in systematic ways. That is, is there a systematic way in which our highly conditional preferences (which, in a very real sense, exist only momentarily at a particular decision point situated within the causal history of the universe) combine such that we can say more than that there are some statistical regularities to our preferences. Our preferences may manage to have some coherent statistical features about which we can make some stochastically consistent statements, but I think this falls short of what we are usually hoping for in terms of meta-preferences, and certainly seems to fall short in terms of how I understand you to be thinking about them (though maybe I misunderstand you: I think of you of thinking of meta-preferences as something that can ultimately be made to have nice mathematical properties, like some version of rationality, that would allow them to be optimized against without weird things happening).

How I ordered the movies

So, in comparing the Star Wars movies, I stated with a few clear orderings, and then compared movies based on criteria I felt were relevant. For example, "Return of the Jedi" and "The Rise of Skywalker" are reasonably similar, and though I found Kylo Ren an interesting character, Vader is... Vader. Iconic characters and emotional arcs were used here.

"Rogue one" is tricky, as it's great on many criteria, but isn't maybe as enjoyable to watch as some of the other moves (in most ways more realistic, more tragic). I could have put it in many different locations, depending on what I emphasised (emotional arcs would have put it lower; exciting story would have put it higher; showing the cost of war would have put it higher, and so on).

I was aware that I would be presenting the ordering to the public, and this made me more likely to use arguments that I could defend. For example, I personally really liked "The Rise of Skywalker", but felt I couldn't justify (to myself or others) putting above "A New Hope". And do the original movies get a bonus for their originality? Maybe. In different moods, I could prefer different movies, quite easily.

After filling out a few orders, I then tried to make everything transitive, comparing a few of the movies one with another, and trying to collapse my circular preferences. There are two places where equal signs snuck in; note that these are pairs that are particularly hard to compare, and they gained the equal (or >=) sign mainly because I sorted out their relative ranking with the other movies and then really couldn't compare them well with each other.

I was also expecting to put "Revenge of the Sith" higher than it ultimately was. Unlike the other two prequels, I feel it was actually a decent movie, and wanted to reflect that by putting it higher; but, in any one-to-one comparisons with the other non-prequels, it ended up the loser. So it was a sort of Condorcet loser.

Analysis of my synthesis

So, how would I compare my own synthesis efforts with what I described in my research agenda?

There are some similarities, to be sure. Partial preferences, and some things that might be defined as meta preferences (not wanting circular preferences, preferring defensible criteria, my ultimately frustrated desire to show that "Revenge of the Sith" was not a bad movie).

Maybe the most important point is that my preference ordering of Star Wars movies did not exist until I synthesised it, and that the synthesis process could have ended up being different. So this is evidence for the core theses of the research agenda: that making preference orderings is a constructive, synthesis process, and that it doesn't have a single clear outcome.

All in all, my efforts were pretty sloppy. Many movies were compared on subjective feelings, the criteria I used were generally the first or second that sprung to mind, I made no effort to be systematic, or to consider at different times and with different moods or aims. If you feel I didn't put many details in the description above - well, there weren't that many details to put.

In these areas, I expect that an AI could do better, could correctly compute my "one-step hypothetical preferences", and get a more invariant representation of my preferences. So, in that way, a preference learning AI would be like me, just better.

But the "meta-preferences" are a bit more worrying. Are they genuine meta-preferences? Especially since the second one is one that was more subconscious, and the third one looks more like a standard preference than a meta-preference. If the category of meta-preference is not clear, then that part of the research agenda needs to be improved.

Anyway, this is my tour through Star Wars and synthesising preferences, two burning issues of different importance.

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

9

Preference synthesis illustrated: Star Wars

9

How I ordered the movies

Analysis of my synthesis