Personal Blog

Many of the ideas presented here require AIs to be antagonistic towards each other - or at least hypothetically antagonistic towards hypothetical other AIs. This can fail if the AIs engage in acausal trade, so it would be useful if we could prevent such things from happening.

Now, I have to admit I'm still quite confused by acausal trade, so I'll simplify it to something I understand much better, an anthropic decision problem.

Staples and paperclips, cooperation and defection

Cilppy has a utility function p, linear in paperclips, while Stapley has a utility function s, linear in staples (and both p and s are normalised to zero with one aditional item adding 1 utility). They are not causally connected, and each must choose "Cooperate" or "Defect". If they "Cooperate", they create 10 copies of the items they do not value (so Clippy creates 10 staples, Stapley creates 10 paperclips). If they choose defect, they create one copy of the item they value (so Clippy creates 1 paperclip, Stapley creates 1 staple).

Assume both agents know these facts, both agents use anthropic decision theories, and both agents are identical apart from their separate locations and distinct utility functions.

Then the outcome is easy: both agents will consider that "cooperate-cooperate" or "defect-defect" are the only two possible options, "cooperate-cooperate" gives them the best outcome, so they will both cooperate. It's a sweet story of cooperation and trust between lovers that never agree and never meet.

Breaking cooperation

How can we demolish this lovely agreement? As I often do, I will assume that there is some event X that will turn Clippy on, with P(X) ≈ 1 (hence P(¬X) << 1). Similarly there is an event Y that turns Stapley on. Since X and Y are almost certain, they should not affect the results above. If the events don't happen, the AIs will never get turned on at all.

Now I am going to modify utility p, replacing it with

p' = p - E(p|¬X).

This p with a single element subtracted off it, the expected value of p given that Clippy has not been turned on. This term feels like a constant, but isn't exactly, as we shall see. Do the same modification to utility s, using Y:

s' = s - E(s|¬Y).

Now contrast "cooperate-cooperate" and "defect-defect". If Clippy and Stapley are both cooperators, then p=s=10. However, if the (incredibly unlikely) ¬X were to happen, then Clippy would not exist, but Stapley would still cooperate (as Stapley has no way of knowing about Clippy's non-existence), and create ten paperclips. So E(p|¬X) = E(p|X) ≈ 10, and p' ≈ 0. Similarly s' ≈ 0.

If both agents are defectors, though, then p=s=1. Since each agent creates its own valuable object, E(p|¬X) = 0 (Clippy cannot create a paperclip if Clippy does not exist) and similarly E(s|¬Y)=0.

So p'=s'=1, and both agents will choose to defect.

If this is a good analogue for acausal decision making, it seems we can break that, if needed.

Personal Blog

3

New Comment
1 comment, sorted by Click to highlight new comments since: Today at 2:17 AM

I think this is only going to work in very limited scenarios. What kind of event is X? Is the uncertainty about it logical or indexical? If it's logical then given enough computing resources, each agent can condition its action on the occurrence of the other agent's event. If it's indexical then the agents can cooperate by each agent physically creating an event of other agent's type in its own physical vicinity.

I am confused by the apparent assumption you can control the utility functions of both agents. If you only worry about cooperation between agents you construct yourself then it's possible you can use this kind of method to prevent it, at least as long as the agents don't break out of the box. However if you worry about cooperation of your AI with agents in other universes then the method seems ineffective.