Epistemic Status
Unsure[1], partially noticing my own confusion. Hoping Cunningham's Law can help resolve it.
Related Answer
Confusions About Arguments From Expected Utility Maximisation
Some MIRI people (e.g. Rob Bensinger) still highlight EU maximisers as the paradigm case for existentially dangerous AI systems. I'm confused by this for a few reasons:
1. Not all consequentialist/goal directed systems are expected utility maximisers
* E.g. humans
2. Some recent developments make me sceptical that VNM expected utility are a natural form of generally intelligent systems
1. Wentworth's subagents provide a model for inexploitable agents that don't maximise a simple unitary utility function
1. The main requirement for subagents to be a better model than unitary agents is path dependent preferences or hidden state variables
2. Alternatively, subagents natively admit partial orders over preferences
1. If I'm not mistaken, utility functions seem to require a (static) total order over preferences
1. This might be a very unreasonable ask; it does not seem to describe humans, animals, or even existing sophisticated AI systems
3. I think the strongest implication of Wentworth's subagents is that expected utility maximisation is not the limit or idealised form of agency
2. Shard Theory suggests that trained agents (via reinforcement learning[2]) form value "shards"
1. Values are inherently "contextual influences on decision making"
1. Hence agents do not have a static total order over preferences (what a utility function implies) as what preferences are active depends on the context
1. Preferences are dynamic (change over time), and the ordering of them is not necessarily total
2. This explains many of the observed inconsistencies in human decision making
2. A multitude of value shards do not admit analysis as a simple unitar