1 min read28th Aug 202111 comments
This is a special post for quick takes by JBlack. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.

New to LessWrong?

11 comments, sorted by Click to highlight new comments since: Today at 8:14 PM

Yet Another Sleeping Beauty variant

In this experiment, like the standard Sleeping Beauty problem, a coin will be flipped and you will go to sleep.

If the coin shows Tails, you will be awoken on both Monday and Tuesday. Unlike the original problem, you will not be given any amnesia drug between Monday and Tuesday.

If the coin shows Heads, a die will be rolled. If the number is Even then you will be awoken on Monday only. If the number is Odd then you will be given false memories of having been previously awoken on Monday, but actually awoken only on Tuesday.

You wake up. You

(a) don't remember waking up in this experiment before. What is your credence that the coin flip was Heads?

(b) remember waking up in this experiment before today. What is your credence that the coin flip was Heads?

(a) 1/3 (b) 1/3

Reasoning is the same as in the standard case: Probability (when used as a degree of belief) expresses the measure of your entanglement with a given hypothetical world. However, it is interesting that in both cases we know what day it is, making this a particularly evil version: Even after Beauty walks out of the experiment their credence still stays 1/3 indefinitely (given it wasn't heads/even in which case beauty will know everything upon waking up on wednesday)!

Here I'll give a shot at constructing a scenario without some of the distractions of the Bomb scenario for FDT:

There is a room you may pay $1000 to enter, and the door closes behind you. A yellow button is next to the door, and opens the door to let you out, with a partial refund: your entry fee minus $100. The other button is on a small table in the middle of the room, and is red. It also opens the door to let you out, and will either return your entry fee plus $100, or nothing. It is labelled with the action that it will carry out: "Win $100" or "Lose $1000".

In general, you know that the action it performs is decided by a fairly reliable Predictor, which takes a snapshot of the entrant's brain and body state as they walk in the building, and uses that to simulate whether or not they press the red button before leaving the room. For every possible person and choice of label, the simulation is good enough to predict whether or not they press the red button before leaving the room with at least 98% accuracy.

The simulation is used to determine what they will do when the red button is labelled "Lose $1000". If they are expected to leave the room without pressing it, then the red button will be labelled "Lose $1000" in reality. Otherwise, it will be labelled "Win $100" in reality.

Is it rational to choose to enter this room? If you believe that it is rational to enter this room, what should you do when the red button is labelled "Lose $1000"?

In this scenario (the one where you always press the red button), any misprediction is actually guaranteed to go against you. The only way to win is for the prediction to correctly predict that you will press the red button (even if you lose).

I don't think your statement "the reason the room can predict so well is because of some learning mechanism that's been active for some time already" is more general. It's much more specific, and contradicts the scenario premise that the predictor can predict with 98% accuracy anyone going in. That includes regardless of whether they've been in the room before.

Do you mean just simple replacement of $1000 entry fee with something larger? Then yes, there are such thresholds.

In this example, the first threshold would be at "Lose $4900", because the predictor is only guaranteed to get it right (and not show you the button) at least 98% of the time. The button could appear up to 2% of the time, for which the weighted loss is $98. The other 98% of the time, the predictor would correctly predict your action and show you the "Win $100" button for a weighted gain of $98. You have nothing (financial) to gain by entering the building now. If you do anyway (e.g. if you're forced in), then you still lose less on average by committing to press the Lose button when it appears.

The next threshold is at "Lose $9900", where the weighted loss is $198 vs weighted gain of $98. If you're forced to enter the building, the average return from the strategy of pressing the Lose button when it appears may now be no better than pressing the yellow button (and losing $100) in that case, and may be slightly worse (because a strategy of pressing Yellow might be mispredicted and show you the Win button anyway).

If you have reason to believe that pressing the red Lose button will be mispredicted less than 2% of the time, then these thresholds will be greater.

Yes, and if you are correct enough about pushing the Lose button when you see it, you will hardly ever see it. On average you would make a profit.

While pondering Bayesian updates in the Sleeping Beauty Paradox, I came across a bizarre variant that features something like an anti-update.

In this variant as in the original, Sleeping Beauty is awakened on Monday regardless of the coin flip. On heads, she will be gently awakened and asked for her credence that the coin flipped heads. On tails, she will be instantly awakened with a mind-ray that also implants a false memory of having been awakened gently and asked for her credence that the coin flipped heads, and her answering. In both cases the interviewer then asks "are you sure?" She is aware of all these rules.

On heads, Sleeping Beauty awakens with certain knowledge that heads was flipped, because if tails was flipped then (according to the rules) she would have a memory that she doesn't have. So she should answer "definitely heads".

Immediately after she answers though, her experience is completely consistent with tails having been flipped, and when asked whether she is sure, should now answer that she is not sure anymore.

This seems deeply weird. Normally new experiences reduce the space of possibilities, and Bayesian updates rely on this. A possibility that previously had zero credence cannot gain non-zero credence through any Bayesian update. I am led to suspect that Bayesian updates cannot be an appropriate model for changing credence in situations where observers can converge in relevant observable state with other possible observers.

Fiction (including false memories) doesn't follow the rules, and standard logic may not apply.  This would feel weird because it IS weird.

The weird thing is that the person doing the anti-update isn't subjected to any "fiction". It's only a possibility that might have happened and didn't.

Alternatively: The subject is show the result of the coin flip. A short time later, if the coin is heads, her memory is modified such that she remembers having seen tails.

Memory erasure can produce this sort of effect in a lot of situations. This is just a special case of that (I assume the false memory of a gentle waking also overwrites the true memory of the abrupt waking).

I deliberately wrote it so that there is no memory trickery or any other mental modification happening at all in the case where Sleeping Beauty updates from "definitely heads" to "hmm, could be tails".

The bizarreness here is that all that is required is being aware of some probability that someone else, even in a counterfactual universe that she knows hasn't happened, could come to share her future mental state without having her current mental state.

Yes, in the tails case memories that Sleeping Beauty might have between sleep and full wakefulness are removed, if the process allows her to have formed any. I was just implicitly assuming that ordinary memory formation would be suppressed during the (short) memory creation process.