I value existential risk reduction; I also value poverty reduction. These two things trade off against each other.

I value being generous; I also value reading interesting books. These two things trade off against each other.

But the way the two tradeoffs work do not seem to be the same. For the first one, I feel comfortable having a utility (for no existential risks) and $U_{\neg P}$ (for no poverty), and then weighting them and maximising the sum:

$λ_{\neg X} U_{\neg X} + λ_{\neg P} U_{\neg P}$ .

If this ends up with me only reducing existential risks, or only reducing poverty, then that's fine, I'm working on the option with the most marginal impact.

For the second one, I would not want to maximise some sum $λ_{G} U_{G} + λ_{R} U_{R}$ , and would certainly complain if I ended up never reading again, or never being generous again. I'd prefer to maximise something like the smooth minimum of $U_{G}$ and $U_{R}$ , something like:

$(\frac{λ_{G} e^{- U_{G}}}{e^{- U_{G}} + e^{- U_{R}}}) U_{G} + (\frac{λ_{R} e^{- U_{R}}}{e^{- U_{G}} + e^{- U_{R}}}) U_{R}$ .

And I'd want the weights to be chosen so that I am very likely to both be generous and read, to some extent, over longer periods of time.

World preferences vs identity preferences

Some time ago, I wrote a post about "preferences over non-rewards". I'm planning to collect most of these preferences into the category of "personal identity": the sort of being you want to be.

The "You're not the boss of me!" preference from that post - meaning you change your preferred action because you were told to/told not to - is very similar to the "4 Problems with self-referential $Θ$ " from this post, and will be both be grouped under "personal identity".

It's my hope that all human preferences and meta-preferences can be synthesised into one of "world preferences" or "personal identity preferences". As this post suggests, the methods of aggregation may be different for the two categories.

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

9

Smoothmin and personal identity

9

World preferences vs identity preferences