Note: working on a research agenda, hence the large amount of small individual posts, to have things to link to in the main documents.

In my initial post on synthesising human preferences, I imagined a linear combination of partial preferences. Later, when talking about identity preferences, I proposed a smoothmin instead (which can also be seen as strong diminishing returns to any one partial preference).

I was trying to formalise how humans seem to trade-off their various preferences in different circumstances. However the ideal is not to a priori decide how humans are trading off the preferences, but instead to copy how humans actually do trade off the preferences.

To do that, we need to imagine the human in situations quite distant from their current ones - situations where some of their partial preferences are more or less fulfilled. This brings up the problem of modelling preferences in distant situations. Assuming some acceptable resolution to that problem, the AI would have an exchange rate between different preferences, in different situations and with different levels of preference fulfilment.

Meta-preferences may constrain the preferences in very distant situations. Most paradoxes of population ethics are in situations very different from today, so population ethics preferences can be seen in this way. Universal moral principles act in a similar way, giving limits on what can happen in all situations, including extreme ones - though note there are arguments to avoid some unusual situations entirely.

Then the AI's task is to come up with a general formula for the exchange rate between different preferences, that extends to all situations and respects the constraints of the meta-preferences. It will probably smooth out quite a bit of "noise" in the exchange rates between different preferences, while respecting the general trends.

New Comment