Since this hypothesis makes distinct predictions, it is possible for the confidence to rise above 50% after finitely many observations.
I was confused about why this is the case. I now think I've got an answer (please anyone confirm):
The description length of the Turing Machine enumerating theorems of PA is constant. The description length of any Turing Machine that enumerates theorems of PA up until time-step n and the does something else grows with n (for big enough n). Since any probability prior over Turing Machines has an implicit simplicity bias, no m...
In particular, this theorem shows that players with very low (little capital/influence on ) will accurately predict
You mean ?
Solution: Black box the whole setup and remove it from the simulation to avoid circularity.
Addendum: I now notice this amounts to brute-forcing a solution to certain particular counterfactuals.
Hi Vanessa! Thanks again for your previous answers. I've got one further concern.
Are all mesa-optimizers really only acausal attackers?
I think mesa-optimizers don't need to be purely contained in a hypothesis (rendering them acausal attackers), but can be made up of a part of the hypotheses-updating procedures (maybe this is obvious and you already considered it).
Of course, since the only way to change the AGI's actions is by changing its hypotheses, even these mesa-optimizers will have to alter hypothesis selection. But their w...
Hmm, given your general definition of planning, shouldn't it include realizations (and their corresponding guided actions) of the form "further thinking about this plan is worse than already acquiring some value now", so that P2B itself already includes acquiring the terminal goal (and optimizing solely for P2B is thus optimal)?
I guess your idea is "plan to P2B better" means "plan with the sole goal of improving P2B", so that it's a "non-value-laden" instrumental goal.