Subagents and impact measures: summary tables

by Stuart Armstrong 1 min read17th Feb 20202 comments

5


These tables will summarise the results of this whole sequence, checking whether subagents can neutralise the impact penalty.

First of all, given a subagent, here are the results for various impact penalties and baselines, and various "value difference summary functions" :

Another way of phrasing " decreasing": it penalises too little power, not too much. Conversely, " increasing" penalises too much power, not too little. Thus, unfortunately:

  • Subagents do allow an agent to get stronger than the indexical impact penalty would allow.
  • Subagents don't allow an agent to get weaker than the indexical impact penalty would allow.

Examples

This table presents, for three specific examples, whether they could actually build a subagent, and whether that would neutralise their impact penalty in practice (in the inaction baseline):

Here, 20BQ is twenty billion questions, RR is relative reachability, and AU is attainable utility.

Now, whether the RR or AU penalties are undermined technically depends on , not on what measure is being used for value. However, I feel that the results undermine the spirit of AU much more than the spirit of RR. AU attempted to control an agent by limiting its power; this effect is mainly neutralised. RR attempted to control the side-effects of an agent by ensuring it had enough power to reach a lot of states; this effect is not neutralised by a subagent.