RR attempted to control the side-effects of an agent by ensuring it had enough power to reach a lot of states; this effect is not neutralised by a subagent.

Things might get complicated by partial observability; in the real world, the agent is minimizing change in its beliefs about what it can reach. Otherwise, you could just get around the SA problem for AUP as well by substituting the reward functions for state indicator reward functions.

Reply

[-]Stuart_Armstrong6y*10

AU and RR have the same $S A$ problem, formally, in terms of excess power; it's just that AU wants low power and RR wants high power, so they don't have the same problem in practice.

Reply

Moderation Log

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

6

Subagents and impact measures: summary tables

6

Examples