A test for symbol grounding methods: true zero-sum games

by Stuart Armstrong 2 min read26th Nov 20192 comments

11


Imagine there are two AIs playing a debate game. The game is zero-sum; at the end of the debate, the human judge assigns the winner, and that AI gets a reward, while the other one gets a .

Except the game, as described, is not truly zero-sum. That is because the AI "get" a reward. How is that reward assigned? Presumably there is some automated system that, when the human presses a button, routes to one AI and to another. These rewards are stored as bits, somewhere "in" or around the two AIs.

Thus there are non zero-sum options: you could break into the whole network, gain control of the automated system, and route to each AI - or, why not, or even or whatnot[1].

Thus, though we can informally say that "the AIs are in a zero-sum game as to which one wins the debate", that sentence is not properly grounded in the world; it is only true as long as certain physical features of the world are maintained, features which are not mentioned in that sentence.

Symbol grounding implies possibility of zero-sum

Conversely, imagine that an AI has a utility/reward which is properly grounded in the world. Then it seems that we should be able to construct an AI with utility/reward which is also properly grounded in the world. So it seems that any good symbol grounding system should allow us to define truly zero sum games between AIs.

There are, of course, a few caveats. Aumann's agreement theorem requires unboundedly rational agents with common priors. Similarly, though properly grounded and are zero-sum, the agents might not be fully zero-sum with each other, due to bounded rationality or different priors.

Indeed, it is possible to setup a situation where even unboundedly rational agents with common prior will knowingly behave in not-exactly zero-sum ways with each other; for example, you can isolate the two agents from each other, and feed them deliberately biased information.

But those caveats aside, it seems that proper symbol grounding implies that you can construct agents that are truly zero-sum towards each other.

Zero-sum implies symbols grounded?

Is this an equivalence? If two agents really do have zero sum utility or reward functions towards each other, does it mean that those functions are well grounded[2]?

It seems that it should be the case. Zero-sum between and means that, for all possible worlds , . There are no actions that we - or any agent - could do that breaks that fundamental equality. So it seems that must be defined by features of the world; grounded symbols.

Now, these grounded symbols might not be exactly what we thought they were; its possible we thought was defined on human happiness, but it is actually only means current in a wire. Still, must then be defined in terms of absence of current in the wire. And, whatever we do with the wire - cut it, replace it, modify it in cunning ways - and must reach opposite on that.

Thus it seems that either there is some grounded concept that and are opposite on, or and contain exhaustive lists of all special cases. If we further assume that and are not absurdly complicated (in a "more complicated than the universe" way), we can rule out the exhaustive list.

So, while I can't say with full confidence that a true zero-sum game must mean that the utilities are grounded, I would take such a thing as a strong indication that they are.


  1. If you thought that was large, nothing will prepare you for - the fast-growing hierarchy indexed by the large Veblen Ordinal. There is no real way to describe how inconceivably huge this number is. ↩︎

  2. Assuming the functions are defined in the world to some extent, not over platonic mathematical facts. ↩︎

11