There are two boxes in front of you. In one of them, there is a little monkey with a cymbal, whilst the other box is empty. In precisely one hour the monkey will clang its cymbal.

While you wait, you produce an estimate of the probability of the monkey being in the first box. Let's assume that you form your last estimate, p, three seconds before the monkey clangs its cymbal. You can see the countdown and you know that it's your final estimate, partly because you're slow at arithmetic.

Let Omega be an AI that can perfectly simulate your entire deliberation process. Before you entered the room, Omega predicted what your last probability estimate would be and decided to place the monkey in a box such as to mess with you. Let q be the probability of Omega placing the monkey in the first box. In particular, Omega, sets q=p/2, unless p=0 or you haven't formed a probability estimate, in which case q=1. 

What probability should you expect that the monkey is in the first box?

I think it's fairly clear that this is a no-win situation. No matter what the final probability estimate you form before clanging, as soon as you've locked it in, you know that it is incorrect, even if you haven't heard the clanging yet. You can try to escape this, but there's no reason that the universe has to play nice.

This problem can be seen as a variation on Death in Damascus. I designed this problem to reveal that the core challenge Death in Damascus poses isn't just that another process in the world can depend upon your decision, but that it can depend upon your expectations even if you don't actually make a decision based upon those expectations.

I also find this problem as a useful intuition pump as I think it's clearer that it's a no-win situation than in other similar problems. In Newcomb's problem, it's easy to get caught up thinking about the Principle of Dominance. In Death in Damascus, you can confuse yourself trying to figure out whether CDT recommends staying or fleeing. At least to me, in this problem it is clearer it is a dead end and that there's no way to beat Omega.

This is also a useful intuition pump for the Evil Genie Puzzle. When I first discovered this puzzle, I felt immensely confused that no matter which decision that you made you would immediately regret it. However, the complexity of the puzzle made it complicated for me to figure out exactly what to make of it, so when trying to solve it I came up with this problem as something easier to grok. I guess my position after considering the Unexpected Clanging is that you just have to accept that a sufficiently powerful agent may be able to mess with you like this and that you just have to deal with it. (I'll leave a more complete analysis to a future post).


New Comment
2 comments, sorted by Click to highlight new comments since: Today at 8:27 PM

There is incentive for hidden expectation/cognition that Omega isn't diagonalizing (things like creating new separate agents in the environment). Also, at least you can know how ground truth depends on official "expectation" of ground truth. Truth of knowledge of this dependence wasn't diagonalized away, so there is opportunity for control.

Interesting. This prank seems to be one you could play on a Logical Inductor, I wonder what the outcome would be? One fact that's possibly related is that computable functions are continuous. This would imply that whatever computable function Omega applies to your probability estimate, there exists a fixed point probability you can choose where you'll be correct about the monkey probability. Of course if you're a bounded agent thinking for a finite amount of time, you might as well be outputting rational probability estimates, in which case functions like become computable for Omega.

There are 1 comments pending acceptance to the Alignment Forum.View them on LessWrong.