How disagreements about Evidential Correlations could be settled

Martín Soto

Since beliefs about Evidential Correlations don't track any direct ground truth, it's not obvious how to resolve disagreements about them, which is very relevant to acausal trade.
Here I present what seems like the only natural method (Third solution below).
Ideas partly generated with Johannes Treutlein.

Say two agents (algorithms A and B), who follow EDT, form a coalition. They are jointly deciding whether to pursue action a. Also, they would like an algorithm C to take action c. As part of their assessment of a, they’re trying to estimate how much evidence (their coalition taking) a would provide for C taking c. If it gave a lot of evidence, they'd have more reason to take a. But they disagree: A thinks the correlation is very strong, and B thinks it’s very weak.

This is exactly the situation in which researchers in acausal trade have many times found themselves: they are considering whether to take a slightly undesirable action a (spending a few resources on paperclips), which could provide evidence for another agent C (a paperclip-maximizing AI in another lightcone) taking an action c (the AI spending a few resources on human happiness) that we'd like to happen. But different researchers A and B (within the coalition of "humans trying to maximize human happiness") have different intuitions about

A priori, there could exist the danger that, by thinking more, they would unexpectedly learn the actual output of C. This would make the trade no longer possible, since then taking a would give them no additional evidence about whether c happens. But, for simplicity, assume that C is so much more complex and chaotic than what A and B can compute, that they are very certain this won’t happen.

First solution: They could dodge the question by just looking for different actions to take they don't disagree on. But that’s boring.

Second solution: They could aggregate their numeric credences somehow. They could get fancy on how to do this. They could even get into more detail, and aggregate parts of their deliberation that are more detailed and informative than a mere number (and that are upstream of this probability), like different heuristics or reference-class estimates they've used to come up with them. They might face some credit assignment problems (which of my heuristics where most important in setting this probability?). This is not boring, but it’s not yet what I want to discuss.

Let’s think about what these correlations actually are and where they come from. These are actually probabilistic beliefs about logical worlds. For example, A might think that in the world where they play a (that is, conditioning A’s distribution on this fact), the likelihood of C playing c is 0.9. While if they don’t, it’s 0.3. Unfortunately, only one of the two logical worlds will be actual. And so, one of these two beliefs will never be checked against any ground truth. If they end up taking a, there won’t be any mathematical fact of the matter as to what would have happened if they had not.

But nonetheless, it’s not as if “real math always gets feedback, and counterfactuals never do”: after all, the still-uncertain agent doesn’t know which counterfactual will be real, and so they use the same general heuristics to think about all of them. When reality hits back on the single counterfactual that becomes actual, it is this heuristic that will be chiseled.

I think that’s the correct picture of bounded logical learning: a pile of heuristics learning through time. This is what Logical Inductors formalize.^[1]

It thus becomes clear that correlations are the “running-time by-product” of using these heuristics to approximate real math. Who cares only one of the counterfactuals will come about? We are hedging our bets by applying the heuristics that were successful in the past to all counterfactuals, and hopefully something good comes out the other end!
That is, using correlations is fundamentally about generalization of past heuristics (like everything, really). This involves trusting that generalization will converge on good things. But that’s okay, we do that all the time. This also involves accepting that, in any one particular data point, the heuristic might be very wrong (but hopefully this will happen less with time).

Third solution: So it’s natural to embrace correlations as the outputs of hacky selected-for heuristics, and it’s looking like the natural way to compare correlations is by comparing these heuristics directly. This is taking the Second solution to its logical conclusion: aggregating the atomic parts of deliberation.
While A and B cannot just investigate what C does directly, they can continue running their respective heuristics on more mathematical observations (that they don’t care about). Hopefully one of the two will prove more useful: it will have a “lower loss” when its predictions are tested against many counterfactual question. And hopefully this is a good sign that the winning heuristic will probably also do well when thinking about C (that is, we are trusting generalization).

In fact, a natural way to implement this (as opposed to running through a lot of irrelevant mathematical observations every time we need a new decision) is to run our heuristics continuously (also in decisions we care about), and keep track of which work better.

Put in terms of Logical Inductors, this amounts to taking all the traders from two Inductors, selecting those that have done best (each tested on their own Inductor), and computing their aggregate bet.

This still leaves something to improve, because the scores of each trader don't include how they would interact with those in the other Inductor. Maybe it would become clear that some traders only have a high score because all the other traders in their Inductor are even dumber.

So it would be even better (and this is another, more expensive way of "scoring the different heuristics") to just run a single Logical Inductor, with all of those heuristics together (and, let's say, a prior over them which is the average of the priors from both Inductors), and seeing all the logical observations that any of the two Inductors had seen.
That is, instead of having both agents learn independently and then compare intuitions with a low bandwidth, you merge them from the start, and ensure the different intuitions have had high bandwidth with all other ones.

The latter is more exhaustive, but way more expensive. And the former might be in some instances a natural way to cut out a lot of computation, without losing out too much expected performance. For example, maybe each Inductor (agent) specializes in a different part of Logic (that you expect to not interact too much with what the other Inductor is doing). Then, what is lost in performance by aggregating them with low bandwidth (instead of merging them from the start) should be less.

Probably this is all pragmatically hard to do in reality, but I think philosophically it’s the best we can hope for.^[2] Which amounts to trusting generalization.

It also runs into some Updateful problems already experienced by Logical Inductors: when you’ve run your heuristics for longer, they might “overfit” to some observed knowledge (that is, they update on it). And so it might seem impossible to find the sweetspot in some situations, where you still don't want to update on some basic information (c), but already want sensible-looking opinions on pretty complex correlations (a). For example, when you would like to use a very advanced heuristic to consider counterfactuals 1 and 2, but the only way to have learned this heuristic is by also having noticed that 1 is always false.^[3] This is usually presented as a problem of Updatefulness, but it might also be understandable as a failure of generalization due to overfitting.

^{^}
And, unsurprisingly, when not only learning is involved, but also exploiting, what we seem to do is Updateful Policy Selection, which is nothing more than an "Action Inductor".
^{^}
Of course I have some small credence on an objective criterion existing, similarly to how I have some small credence on an objective metric for decision theories existing that we've overlooked. I just think it’s pretty obvious that’s not how philosophy has shaped up.
^{^}
Vacuously, there does always exist some Inductor with a prior weird enough to learn the useful heuristic (or have any opinions about the counterfactual that you want it to have) without learning 1 is false. But this amounts to "already knowing what you're looking for" (and you'd have to go over a lot of Inductors to find this one, thus updating a on a lot of math yourself, etc.), which is not really what you wanted the Inductor for in the first place. You wanted it (with its arbitrary simplicity prior over traders) as a reliable way of noticing patterns in reality that seem like your best chance at prediction.

AI ALIGNMENT FORUM
AF

How disagreements about Evidential Correlations could be settled

8

8