Oracle predictions don't apply to non-existent worlds

[-]Vladimir_Nesov4y30

Consider the variant where the Oracle demands a fee of 100 utilons after delivering the prediction, which you can't refuse. Then the winning strategy is going to be about ensuring that the current situation is counterfactual, so that in actuality you won't have to pay the Oracle's fee, because the Oracle wouldn't be able to deliver a correct prediction.

The Oracle's prediction only has to apply to the world that is. It doesn't have to apply to worlds that are not.

The Oracle's prediction only has to apply to the world where the prediction is delivered. It doesn't have to apply to the other worlds. The world where the prediction is delivered can be the world that is not, and another world can be the world that is.

[-]Chris_Leong4y10

"The Oracle's prediction only has to apply to the world where the prediction is delivered" - My point was that predictions that are delivered in the factual don't apply to counterfactuals, but the way you've framed it is better as it handles a more general set of cases. It seems like we're on the same page.

[-]Vladimir_Nesov4y10

It's not actually more general, it's instead about a somewhat different point. The more general statement could use some sort of a notion of relative actuality, to point at the possibly counterfactual world determined by the decision made in the world where the prediction was delivered, which is distinct from the even more counterfactual worlds where the prediction was delivered but the decision was different from what it would relative-actually be had the prediction been delivered, and from the worlds where the prediction was not delivered at all.

If the prediction is not actually delivered, then it only applies to that intermediately-counterfactual world and not to the more counterfactual alternatives where the prediction was still delivered or to the less counterfactual situation where the prediction is not delivered. Saying that the prediction applies to the world where it's delivered is liable to be interpreted as including the more-counterfactual worlds, but it doesn't have to apply there, it only applies to the relatively-actual world. So your original framing has a necessary part of saying this carefully that my framing didn't include, replacing it with my framing discards this correct detail. The Oracle's prediction only has to apply to the "relatively-actual" world where the prediction is delivered.

[-]Gordon Seidoh Worley4y20

Small insight why reading this: I'm starting to suspect that most (all???) unintuitive things that happen with Oracles are the result of them violating our intuitions about causality because they actually deliver no information, in that nothing can be conditioned on what the Oracle says because if we could then the Oracle would fail to actually be an Oracle, so we can only condition on the existence of the Oracle and how it functions and not what it actually says, e.g. you should still 1-box but it's mistaken to think anything an Oracle tells you allows you to do anything different.

[-]Chris_Leong4y20

Yeah, you want either information about the available counterfactuals or information independent of your decision. Information about just the path taken isn't something you can condition on.

[-]Vladimir_Nesov4y*20

When the Oracle says "The taxi will arrive in one minute!", you may as well grab your coat.

[-]Chris_Leong4y10

Isn't that prediction independent of your decision to grab your coat or not?

[-]Vladimir_Nesov4y10

The prediction is why you grab your coat, it's both meaningful and useful to you, a simple counterexample to the sentiment that since correctness scope of predictions is unclear, they are no good. The prediction is not about the coat, but that dependence wasn't mentioned in the arguments against usefulness of predictions above.

[-]Dagon4y00

Sure, that's a sane Oracle. The Weird Oracle used in so many thought experiments doesn't say ""The taxi will arrive in one minute!", it says "You will grab your coat in time for the taxi.".

[-]Vladimir_Nesov4y20

No, this is an important point: the agent normally doesn't know the correctness scope of the Oracle's prediction. It's only guaranteed to be correct on the actual decision, and can be incorrect in all other counterfactuals. So if the agent knows the boundaries of the correctness scope, they may play chicken and render the Oracle wrong by enacting the counterfactual where the prediction is false. And if the agent doesn't know the boundaries of the prediction's correctness, how are they to make use of it in evaluating counterfactuals?

It seems that the way to reason about this is to stipulate correctness of the prediction in all counterfactuals, even though it's not necessarily correct in all counterfactuals, in the same way as the agent's decision that is being considered is stipulated to be different in different counterfactuals, even though the algorithm forces it to be the same. So it's a good generalization of the problem of formulating counterfactuals, it moves the intervention point from agent's own decisions to correctness of powerful predictors' claims. These claims act on the counterfactuals generated by the agent's own decisions, not on the counterfactuals generated by delivery of possible claims, so it's not about merely treating predictors as agents, it's a novel setup.

[-]Dagon4y00

Is there an ELI5 doc about what's "normal" for Oracles, and why they're constrained in that way? The examples I see confuse me in that they are exploring what seem like edge cases, and I'm missing the underlying model that makes these cases critical.

Specifically, when you say "It's only guaranteed to be correct on the actual decision", why does the agent not know what "correct" means for the decision?

[-]Vladimir_Nesov4y10

Specifically, when you say "It's only guaranteed to be correct on the actual decision", why does the agent not know what "correct" means for the decision?

The agent knows what "correct" means, correctness of a claim is defined for the possible worlds that the agent is considering while making its decision (which by local tradition we confusingly collectively call "counterfactuals", even though one of them is generated by the actual decision and isn't contrary to any fact).

In the post Chris_Leong draws attention to the point that since the Oracle knows which possible world is actual, there is nothing forcing its prediction to be correct on the other possible worlds that the agent foolishly considers, not knowing that they are contrary to fact. And my point in this thread is that despite the uncertainty it seems like we have to magically stipulate correctness of the Oracle on all possible worlds in the same way that we already magically stipulate the possibility of making different decisions in different possible worlds, and this analogy might cast some light on the nature of this magic.

[-]Chris_Leong4y10

That's an interesting point. I suppose it might be viable to acknowledge that the problem taken literally doesn't require the prediction to be correct outside of the factual, but nonetheless claim that we should resolve the vagueness inherent in the question about what exactly the counterfactual is by constructing it to meet this condition. I wouldn't necessarily be strongly against this - my issue is confusion about what an Oracle's prediction necessarily entails.

Regarding, your notion about things being magically stipulated, I suppose there's some possible resemblance there with the ideas I proposed before in Counterfactuals As A Matter of Social Convention, although The Nature of Counterfactuals describes where my views have shifted to since then.

[-]Dagon4y00

Hmm. So does this only apply to CDT agents, who foolishly believe that their decision is not subject to predictions?

[-]Vladimir_Nesov4y20

No, I suspect it's a correct ingredient of counterfactuals, one I didn't see discussed before, not an error restricted to a particular decision theory. There is no contradiction in considering each of the counterfactuals as having a given possible decision made by the agent and satisfying the Oracle's prediction, as the agent doesn't know that it won't make this exact decision. And if it does make this exact decision, the prediction is going to be correct, just like the possible decision indexing the counterfactual is going to be the decision actually taken. Most decision theories allow explicitly considering different possible decisions, and adding correctness of the Oracle's prediction into the mix doesn't seem fundamentally different in any way, it's similarly sketchy.

[-]Dagon4y00

Thanks for patience with this. I am still missing some fundamental assumption or framing about why this is non-obvious (IMO, either the Oracle is wrong, or the choice is illusory). I'll continue to examine the discussions and examples in hopes that it will click.

[-]Chris_Leong4y10

I presume Vladimir and me are likely discussing this from within the determinist paradigm in which "either the Oracle is wrong, or the choice is illusory" doesn't apply (although I propose a similar idea in Why 1-boxing doesn't imply backwards causation).

[-]Vladimir_Nesov4y00

IMO, either the Oracle is wrong, or the choice is illusory

This is similar to determinism vs. free will, and suggests the following example. The Oracle proclaims: "The world will follow the laws of physics!". But in the counterfactual where an agent takes a decision that won't actually be taken, the fact of taking that counterfactual decision contradicts the agent's cognition following the laws of physics. Yet we want to think about the world within the counterfactual as if the laws of physics are followed.

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

5

Oracle predictions don't apply to non-existent worlds

5